Abstract
In recent years, several compressed indexes based on variants of the Burrows–Wheeler transform have been introduced. Some of these are used to index structures far more complex than a single string, as was originally done with the FM-index (Ferragina and Manzini in J. ACM 52(4):552–581, https://doi.org/10.1145/1082036.1082039, 2005). As such, there has been an increasing effort to better understand under which conditions such an indexing scheme is possible. This has led to the introduction of Wheeler graphs (Gagie et al. in Theor Comput Sci 698:67–78, https://doi.org/10.1016/j.tcs.2017.06.016, 2017). Gagie et al. showed that de Bruijn graphs, generalized compressed suffix arrays, and several other BWT related structures can be represented as Wheeler graphs, and that Wheeler graphs can be indexed in a space-efficient way. Hence, being able to recognize whether a given graph is a Wheeler graph, or being able to approximate a given graph by a Wheeler graph, could have numerous applications in indexing. Here we resolve the open question of whether there exists an efficient algorithm for recognizing if a given graph is a Wheeler graph. We show:
-
The problem of recognizing whether a given graph \(G=(V, E)\) is a Wheeler graph is NP-complete for any edge label alphabet of size \(\sigma \ge 2\), even when G is a DAG. This holds even on a restricted subset of graphs called d-NFAs for \(d \ge 5\). This is in contrast to recent results demonstrating the problem can be solved in polynomial time for d-NFAs where \(d \le 2\). We also show that the recognition problem can be solved in linear time for \(\sigma =1\) on graphs without self-loops;
-
There exists an \(2^{e\log \sigma + O(n + e)}\) time exact algorithm where \(n = |V|\) and \(e = |E|\). This algorithm relies on graph isomorphism being computable in strictly sub-exponential time;
-
We define an optimization variant of the problem called Wheeler Graph Violation, abbreviated WGV, where the aim is to identify the smallest set of edges that have to be removed from a graph to obtain a Wheeler graph. We show WGV is APX-hard, even when G is a DAG, implying there exists a constant \(C > 1\) for which there is no C-approximation algorithm (unless P = NP). Also, conditioned on the Unique Games Conjecture, for all \(C > 1\), it is NP-hard to find a C-approximation, implying WGV is not in APX;
-
We define the Wheeler Subgraph problem, abbreviated WS, where the aim is to find the largest subgraph which is a Wheeler Graph (the dual of WGV). In contrast to WGV, we give an \(O(\sigma )\)-approximation algorithm for the WS problem, implying it is in APX for \(\sigma = O(1)\).
The above findings suggest that most problems under this theme are computationally difficult. However, we identify a class of graphs for which the recognition problem is polynomial-time solvable, raising the question of which properties determine this problem’s difficulty.
Similar content being viewed by others
References
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975). https://doi.org/10.1145/360825.360855
Alanko, J., D’Agostino, G., Policriti, A., Prezza, N.: Regular languages meet prefix sorting. In: Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5–8, 2020, pp. 911–930 (2020). https://doi.org/10.1137/1.9781611975994.55
Alanko, J., D’Agostino, G., Policriti, A., Prezza, N.: Wheeler languages. Inf. Comput. (2021). https://doi.org/10.1016/j.ic.2021.104820
Alanko, J.N., Gagie, T., Navarro, G., Benkner, L.S.: Tunneling on wheeler graphs. In: Data Compression Conference, DCC 2019, Snowbird, UT, USA, March 26–29, 2019, pp. 122–131 (2019). https://doi.org/10.1109/DCC.2019.00020
Babai, L., Luks, E.M.: Canonical labeling of graphs. In: Proceedings of the 15th Annual ACM Symposium on Theory of Computing, 25–27 April, 1983, Boston, Massachusetts, USA, pp. 171–183 (1983). https://doi.org/10.1145/800061.808746
Belazzougui, D.: Succinct dictionary matching with no slowdown. In: Combinatorial Pattern Matching, 21st Annual Symposium, CPM 2010, New York, NY, USA, June 21–23, 2010. Proceedings, pp. 88–100 (2010). https://doi.org/10.1007/978-3-642-13509-5_9
Booth, K.S.: Pq-tree algorithms. Technical report, California University, Livermore (USA). Lawrence Livermore Laboratory (1975)
Bowe, A., Onodera, T., Sadakane, K., Shibuya, T.: Succinct de bruijn graphs. In: Algorithms in Bioinformatics—12th International Workshop, WABI 2012, Ljubljana, Slovenia, September 10–12, 2012. Proceedings, pp. 225–235 (2012). https://doi.org/10.1007/978-3-642-33122-0_18
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. SRC Research Report (1994)
Chen, J., Liu, Y., Lu, S., O’Sullivan, B., Razgon, I.: A fixed-parameter algorithm for the directed feedback vertex set problem. J. ACM 55(5), 21:1-21:19 (2008). https://doi.org/10.1145/1411509.1411511
Chiba, N., Nishizeki, T., Abe, S., Ozawa, T.: A linear algorithm for embedding planar graphs using pq-trees. J. Comput. Syst. Sci. 30(1), 54–76 (1985). https://doi.org/10.1016/0022-0000(85)90004-2
Claude, F., Navarro, G., Pereira, A.O.: The wavelet matrix: an efficient wavelet tree for large alphabets. Inf. Syst. 47, 15–32 (2015). https://doi.org/10.1016/j.is.2014.06.002
Cotumaccio, N., Prezza, N.: On indexing and compressing finite automata. In: D. Marx (ed) Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10–13, 2021, pp. 2585–2599. SIAM (2021). https://doi.org/10.1137/1.9781611976465.153
De Bruijn, N.G.: A combinatorial problem. Koninklijke Nederlandse Akademie v. Wetenschappen 49(49), 758–764 (1946)
Dujmovic, V., Wood, D.R.: On linear layouts of graphs. Discrete Math. Theor. Comput. Sci. 6(2), 339–358 (2004). (http://dmtcs.episciences.org/317)
Equi, M., Grossi, R., Mäkinen, V., Tomescu, A.I.: On the complexity of string matching for graphs. In: C. Baier, I. Chatzigiannakis, P. Flocchini, S. Leonardi (eds) 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9–12, 2019, Patras, Greece, LIPIcs, vol. 132, pp. 55:1–55:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). https://doi.org/10.4230/LIPIcs.ICALP.2019.55
Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and indexing labeled trees, with applications. J. ACM 57(1), 4:1-4:33 (2009). https://doi.org/10.1145/1613676.1613680
Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005). https://doi.org/10.1145/1082036.1082039
Ferragina, P., Venturini, R.: The compressed permuterm index. ACM Trans. Algorithms 7(1), 10:1-10:21 (2010). https://doi.org/10.1145/1868237.1868248
Gagie, T., Manzini, G., Sirén, J.: Wheeler graphs: a framework for bwt-based data structures. Theor. Comput. Sci. 698, 67–78 (2017). https://doi.org/10.1016/j.tcs.2017.06.016
Ganguly, A., Shah, R., Thankachan, S.V.: pbwt: Achieving succinct data structures for parameterized pattern matching and related problems. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16–19, pp. 397–407 (2017). https://doi.org/10.1137/1.9781611974782.25
Gibney, D., Hoppenworth, G., Thankachan, S.V.: Simple reductions from formula-sat to pattern matching on labeled graphs and subtree isomorphism. In: H.V. Le, V. King (eds) 4th Symposium on Simplicity in Algorithms, SOSA 2021, Virtual Conference, January 11–12, 2021, pp. 232–242. SIAM (2021). https://doi.org/10.1137/1.9781611976496.26
Gibney, D., Thankachan, S.V.: On the hardness and inapproximability of recognizing wheeler graphs. In: 27th Annual European Symposium on Algorithms, ESA 2019, September 9–11, 2019, Munich/Garching, Germany, pp. 51:1–51:16 (2019). https://doi.org/10.4230/LIPIcs.ESA.2019.51
Guruswami, V., Manokaran, R., Raghavendra, P.: Beating the random ordering is hard: inapproximability of maximum acyclic subgraph. In: 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25–28, 2008, Philadelphia, PA, USA, pp. 573–582 (2008). https://doi.org/10.1109/FOCS.2008.51
Haeupler, B., Tarjan, R.E.: Planarity algorithms via pq-trees (extended abstract). Electr. Not. Discrete Math. 31, 143–149 (2008). https://doi.org/10.1016/j.endm.2008.06.029
Heath, L.S., Pemmaraju, S.V.: Stack and queue layouts of directed acyclic graphs: Part II. SIAM J. Comput. 28(5), 1588–1626 (1999). https://doi.org/10.1137/S0097539795291550
Heath, L.S., Pemmaraju, S.V., Trenk, A.N.: Stack and queue layouts of directed acyclic graphs: Part I. SIAM J. Comput. 28(4), 1510–1539 (1999). https://doi.org/10.1137/S0097539795280287
Heath, L.S., Rosenberg, A.L.: Laying out graphs using queues. SIAM J. Comput. 21(5), 927–958 (1992). https://doi.org/10.1137/0221055
Hon, W., Ku, T., Shah, R., Thankachan, S.V., Vitter, J.S.: Faster compressed dictionary matching. Theor. Comput. Sci. 475, 113–119 (2013). https://doi.org/10.1016/j.tcs.2012.10.050
Jiang, H., Chauve, C., Zhu, B.: Breakpoint distance and pq-trees. In: Combinatorial Pattern Matching, 21st Annual Symposium, CPM 2010, New York, NY, USA, June 21–23, 2010. Proceedings, pp. 112–124 (2010). https://doi.org/10.1007/978-3-642-13509-5_11
Kann, V.: On the approximability of np-complete optimization problems. Ph.d. thesis, Royal Institute of Technology Stockholm (1992)
Landau, G.M., Parida, L., Weimann, O.: Gene proximity analysis across whole genomes via PQ trees\({}^{\text{1 }}\). J. Comput. Biol. 12(10), 1289–1306 (2005). https://doi.org/10.1089/cmb.2005.12.1289
Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An extension of the burrows wheeler transform and applications to sequence comparison and data compression. In: Combinatorial Pattern Matching, 16th Annual Symposium, CPM 2005, Jeju Island, Korea, June 19–22, 2005, Proceedings, pp. 178–189 (2005). https://doi.org/10.1007/11496656_16
Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An extension of the Burrows–Wheeler transform. Theor. Comput. Sci. 387(3), 298–312 (2007). https://doi.org/10.1016/j.tcs.2007.07.014
Miller, G.L.: Graph isomorphism, general remarks. J. Comput. Syst. Sci. 18(2), 128–142 (1979). https://doi.org/10.1016/0022-0000(79)90043-6
Novak, A.M., Garrison, E., Paten, B.: A graph extension of the positional Burrows–Wheeler transform and its applications. Algorithms Mol. Biol. 12(1), 18:1-18:12 (2017). https://doi.org/10.1186/s13015-017-0109-9
Opatrny, J.: Total ordering problem. SIAM J. Comput. 8(1), 111–114 (1979). https://doi.org/10.1137/0208008
Sirén, J., Välimäki, N., Mäkinen, V.: Indexing graphs for path queries with applications in genome research. IEEE ACM Trans. Comput. Biol. Bioinf. (TCBB) 11(2), 375–388 (2014)
Younger, D.: Minimum feedback arc sets for a directed graph. IEEE Trans. Circuit Theory 10(2), 238–245 (1963)
Acknowledgements
This research is supported in part by the U.S. National Science Foundation under the Grants CCF-1703489 and CCF-2112643. The first author has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 690941. We thank Travis Gagie and Nicola Prezza for introducing this problem to us. We also thank the anonymous reviewers of ESA 2019, where a preliminary version of this paper was published [23]. The author would also like to acknowledge the BIRDS project (Bioinformatics and Information Retrieval Data Structures Analysis and Design) and the Workshop on Compression, Text and Algorithms 2018.
Funding
This research is supported in part by the U.S. National Science Foundation under the Grants CCF-1703489 and CCF-2112643. The first author has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 690941.
Author information
Authors and Affiliations
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts/competing interests.
Consent for publication
The authors consent for this work to be published in Algorithmica.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research is supported in part by the U.S. National Science Foundation under the Grants CCF-1703489 and CCF-2112643. The first author has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 690941.
Rights and permissions
About this article
Cite this article
Gibney, D., Thankachan, S.V. On the Complexity of Recognizing Wheeler Graphs. Algorithmica 84, 784–814 (2022). https://doi.org/10.1007/s00453-021-00917-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-021-00917-5