Skip to main content
Log in

On the Complexity of Recognizing Wheeler Graphs

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

In recent years, several compressed indexes based on variants of the Burrows–Wheeler transform have been introduced. Some of these are used to index structures far more complex than a single string, as was originally done with the FM-index (Ferragina and Manzini in J. ACM 52(4):552–581, https://doi.org/10.1145/1082036.1082039, 2005). As such, there has been an increasing effort to better understand under which conditions such an indexing scheme is possible. This has led to the introduction of Wheeler graphs (Gagie et al. in Theor Comput Sci 698:67–78, https://doi.org/10.1016/j.tcs.2017.06.016, 2017). Gagie et al. showed that de Bruijn graphs, generalized compressed suffix arrays, and several other BWT related structures can be represented as Wheeler graphs, and that Wheeler graphs can be indexed in a space-efficient way. Hence, being able to recognize whether a given graph is a Wheeler graph, or being able to approximate a given graph by a Wheeler graph, could have numerous applications in indexing. Here we resolve the open question of whether there exists an efficient algorithm for recognizing if a given graph is a Wheeler graph. We show:

  • The problem of recognizing whether a given graph \(G=(V, E)\) is a Wheeler graph is NP-complete for any edge label alphabet of size \(\sigma \ge 2\), even when G is a DAG. This holds even on a restricted subset of graphs called d-NFAs for \(d \ge 5\). This is in contrast to recent results demonstrating the problem can be solved in polynomial time for d-NFAs where \(d \le 2\). We also show that the recognition problem can be solved in linear time for \(\sigma =1\) on graphs without self-loops;

  • There exists an \(2^{e\log \sigma + O(n + e)}\) time exact algorithm where \(n = |V|\) and \(e = |E|\). This algorithm relies on graph isomorphism being computable in strictly sub-exponential time;

  • We define an optimization variant of the problem called Wheeler Graph Violation, abbreviated WGV, where the aim is to identify the smallest set of edges that have to be removed from a graph to obtain a Wheeler graph. We show WGV is APX-hard, even when G is a DAG, implying there exists a constant \(C > 1\) for which there is no C-approximation algorithm (unless P = NP). Also, conditioned on the Unique Games Conjecture, for all \(C > 1\), it is NP-hard to find a C-approximation, implying WGV is not in APX;

  • We define the Wheeler Subgraph problem, abbreviated WS, where the aim is to find the largest subgraph which is a Wheeler Graph (the dual of WGV). In contrast to WGV, we give an \(O(\sigma )\)-approximation algorithm for the WS problem, implying it is in APX for \(\sigma = O(1)\).

The above findings suggest that most problems under this theme are computationally difficult. However, we identify a class of graphs for which the recognition problem is polynomial-time solvable, raising the question of which properties determine this problem’s difficulty.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975). https://doi.org/10.1145/360825.360855

    Article  MathSciNet  MATH  Google Scholar 

  2. Alanko, J., D’Agostino, G., Policriti, A., Prezza, N.: Regular languages meet prefix sorting. In: Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5–8, 2020, pp. 911–930 (2020). https://doi.org/10.1137/1.9781611975994.55

  3. Alanko, J., D’Agostino, G., Policriti, A., Prezza, N.: Wheeler languages. Inf. Comput. (2021). https://doi.org/10.1016/j.ic.2021.104820

  4. Alanko, J.N., Gagie, T., Navarro, G., Benkner, L.S.: Tunneling on wheeler graphs. In: Data Compression Conference, DCC 2019, Snowbird, UT, USA, March 26–29, 2019, pp. 122–131 (2019). https://doi.org/10.1109/DCC.2019.00020

  5. Babai, L., Luks, E.M.: Canonical labeling of graphs. In: Proceedings of the 15th Annual ACM Symposium on Theory of Computing, 25–27 April, 1983, Boston, Massachusetts, USA, pp. 171–183 (1983). https://doi.org/10.1145/800061.808746

  6. Belazzougui, D.: Succinct dictionary matching with no slowdown. In: Combinatorial Pattern Matching, 21st Annual Symposium, CPM 2010, New York, NY, USA, June 21–23, 2010. Proceedings, pp. 88–100 (2010). https://doi.org/10.1007/978-3-642-13509-5_9

  7. Booth, K.S.: Pq-tree algorithms. Technical report, California University, Livermore (USA). Lawrence Livermore Laboratory (1975)

  8. Bowe, A., Onodera, T., Sadakane, K., Shibuya, T.: Succinct de bruijn graphs. In: Algorithms in Bioinformatics—12th International Workshop, WABI 2012, Ljubljana, Slovenia, September 10–12, 2012. Proceedings, pp. 225–235 (2012). https://doi.org/10.1007/978-3-642-33122-0_18

  9. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. SRC Research Report (1994)

  10. Chen, J., Liu, Y., Lu, S., O’Sullivan, B., Razgon, I.: A fixed-parameter algorithm for the directed feedback vertex set problem. J. ACM 55(5), 21:1-21:19 (2008). https://doi.org/10.1145/1411509.1411511

  11. Chiba, N., Nishizeki, T., Abe, S., Ozawa, T.: A linear algorithm for embedding planar graphs using pq-trees. J. Comput. Syst. Sci. 30(1), 54–76 (1985). https://doi.org/10.1016/0022-0000(85)90004-2

    Article  MathSciNet  MATH  Google Scholar 

  12. Claude, F., Navarro, G., Pereira, A.O.: The wavelet matrix: an efficient wavelet tree for large alphabets. Inf. Syst. 47, 15–32 (2015). https://doi.org/10.1016/j.is.2014.06.002

    Article  Google Scholar 

  13. Cotumaccio, N., Prezza, N.: On indexing and compressing finite automata. In: D. Marx (ed) Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10–13, 2021, pp. 2585–2599. SIAM (2021). https://doi.org/10.1137/1.9781611976465.153

  14. De Bruijn, N.G.: A combinatorial problem. Koninklijke Nederlandse Akademie v. Wetenschappen 49(49), 758–764 (1946)

    MATH  Google Scholar 

  15. Dujmovic, V., Wood, D.R.: On linear layouts of graphs. Discrete Math. Theor. Comput. Sci. 6(2), 339–358 (2004). (http://dmtcs.episciences.org/317)

    MathSciNet  MATH  Google Scholar 

  16. Equi, M., Grossi, R., Mäkinen, V., Tomescu, A.I.: On the complexity of string matching for graphs. In: C. Baier, I. Chatzigiannakis, P. Flocchini, S. Leonardi (eds) 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9–12, 2019, Patras, Greece, LIPIcs, vol. 132, pp. 55:1–55:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). https://doi.org/10.4230/LIPIcs.ICALP.2019.55

  17. Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and indexing labeled trees, with applications. J. ACM 57(1), 4:1-4:33 (2009). https://doi.org/10.1145/1613676.1613680

    Article  MathSciNet  MATH  Google Scholar 

  18. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005). https://doi.org/10.1145/1082036.1082039

    Article  MathSciNet  MATH  Google Scholar 

  19. Ferragina, P., Venturini, R.: The compressed permuterm index. ACM Trans. Algorithms 7(1), 10:1-10:21 (2010). https://doi.org/10.1145/1868237.1868248

    Article  MathSciNet  MATH  Google Scholar 

  20. Gagie, T., Manzini, G., Sirén, J.: Wheeler graphs: a framework for bwt-based data structures. Theor. Comput. Sci. 698, 67–78 (2017). https://doi.org/10.1016/j.tcs.2017.06.016

    Article  MathSciNet  MATH  Google Scholar 

  21. Ganguly, A., Shah, R., Thankachan, S.V.: pbwt: Achieving succinct data structures for parameterized pattern matching and related problems. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16–19, pp. 397–407 (2017). https://doi.org/10.1137/1.9781611974782.25

  22. Gibney, D., Hoppenworth, G., Thankachan, S.V.: Simple reductions from formula-sat to pattern matching on labeled graphs and subtree isomorphism. In: H.V. Le, V. King (eds) 4th Symposium on Simplicity in Algorithms, SOSA 2021, Virtual Conference, January 11–12, 2021, pp. 232–242. SIAM (2021). https://doi.org/10.1137/1.9781611976496.26

  23. Gibney, D., Thankachan, S.V.: On the hardness and inapproximability of recognizing wheeler graphs. In: 27th Annual European Symposium on Algorithms, ESA 2019, September 9–11, 2019, Munich/Garching, Germany, pp. 51:1–51:16 (2019). https://doi.org/10.4230/LIPIcs.ESA.2019.51

  24. Guruswami, V., Manokaran, R., Raghavendra, P.: Beating the random ordering is hard: inapproximability of maximum acyclic subgraph. In: 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25–28, 2008, Philadelphia, PA, USA, pp. 573–582 (2008). https://doi.org/10.1109/FOCS.2008.51

  25. Haeupler, B., Tarjan, R.E.: Planarity algorithms via pq-trees (extended abstract). Electr. Not. Discrete Math. 31, 143–149 (2008). https://doi.org/10.1016/j.endm.2008.06.029

    Article  MATH  Google Scholar 

  26. Heath, L.S., Pemmaraju, S.V.: Stack and queue layouts of directed acyclic graphs: Part II. SIAM J. Comput. 28(5), 1588–1626 (1999). https://doi.org/10.1137/S0097539795291550

    Article  MathSciNet  MATH  Google Scholar 

  27. Heath, L.S., Pemmaraju, S.V., Trenk, A.N.: Stack and queue layouts of directed acyclic graphs: Part I. SIAM J. Comput. 28(4), 1510–1539 (1999). https://doi.org/10.1137/S0097539795280287

    Article  MathSciNet  MATH  Google Scholar 

  28. Heath, L.S., Rosenberg, A.L.: Laying out graphs using queues. SIAM J. Comput. 21(5), 927–958 (1992). https://doi.org/10.1137/0221055

    Article  MathSciNet  MATH  Google Scholar 

  29. Hon, W., Ku, T., Shah, R., Thankachan, S.V., Vitter, J.S.: Faster compressed dictionary matching. Theor. Comput. Sci. 475, 113–119 (2013). https://doi.org/10.1016/j.tcs.2012.10.050

    Article  MathSciNet  MATH  Google Scholar 

  30. Jiang, H., Chauve, C., Zhu, B.: Breakpoint distance and pq-trees. In: Combinatorial Pattern Matching, 21st Annual Symposium, CPM 2010, New York, NY, USA, June 21–23, 2010. Proceedings, pp. 112–124 (2010). https://doi.org/10.1007/978-3-642-13509-5_11

  31. Kann, V.: On the approximability of np-complete optimization problems. Ph.d. thesis, Royal Institute of Technology Stockholm (1992)

  32. Landau, G.M., Parida, L., Weimann, O.: Gene proximity analysis across whole genomes via PQ trees\({}^{\text{1 }}\). J. Comput. Biol. 12(10), 1289–1306 (2005). https://doi.org/10.1089/cmb.2005.12.1289

    Article  Google Scholar 

  33. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An extension of the burrows wheeler transform and applications to sequence comparison and data compression. In: Combinatorial Pattern Matching, 16th Annual Symposium, CPM 2005, Jeju Island, Korea, June 19–22, 2005, Proceedings, pp. 178–189 (2005). https://doi.org/10.1007/11496656_16

  34. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An extension of the Burrows–Wheeler transform. Theor. Comput. Sci. 387(3), 298–312 (2007). https://doi.org/10.1016/j.tcs.2007.07.014

    Article  MathSciNet  MATH  Google Scholar 

  35. Miller, G.L.: Graph isomorphism, general remarks. J. Comput. Syst. Sci. 18(2), 128–142 (1979). https://doi.org/10.1016/0022-0000(79)90043-6

    Article  MathSciNet  MATH  Google Scholar 

  36. Novak, A.M., Garrison, E., Paten, B.: A graph extension of the positional Burrows–Wheeler transform and its applications. Algorithms Mol. Biol. 12(1), 18:1-18:12 (2017). https://doi.org/10.1186/s13015-017-0109-9

    Article  Google Scholar 

  37. Opatrny, J.: Total ordering problem. SIAM J. Comput. 8(1), 111–114 (1979). https://doi.org/10.1137/0208008

    Article  MathSciNet  MATH  Google Scholar 

  38. Sirén, J., Välimäki, N., Mäkinen, V.: Indexing graphs for path queries with applications in genome research. IEEE ACM Trans. Comput. Biol. Bioinf. (TCBB) 11(2), 375–388 (2014)

    Article  Google Scholar 

  39. Younger, D.: Minimum feedback arc sets for a directed graph. IEEE Trans. Circuit Theory 10(2), 238–245 (1963)

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported in part by the U.S. National Science Foundation under the Grants CCF-1703489 and CCF-2112643. The first author has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 690941. We thank Travis Gagie and Nicola Prezza for introducing this problem to us. We also thank the anonymous reviewers of ESA 2019, where a preliminary version of this paper was published [23]. The author would also like to acknowledge the BIRDS project (Bioinformatics and Information Retrieval Data Structures Analysis and Design) and the Workshop on Compression, Text and Algorithms 2018.

Funding

This research is supported in part by the U.S. National Science Foundation under the Grants CCF-1703489 and CCF-2112643. The first author has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 690941.

Author information

Authors and Affiliations

Authors

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts/competing interests.

Consent for publication

The authors consent for this work to be published in Algorithmica.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research is supported in part by the U.S. National Science Foundation under the Grants CCF-1703489 and CCF-2112643. The first author has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 690941.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gibney, D., Thankachan, S.V. On the Complexity of Recognizing Wheeler Graphs. Algorithmica 84, 784–814 (2022). https://doi.org/10.1007/s00453-021-00917-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-021-00917-5

Keywords

Navigation