Abstract
In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size e for a text T of length n into other text indexing structures for the same text, suitable for highly repetitive texts: the run-length BWT of size r, the irreducible PLCP array of size r, and the quasi-irreducible LPF array of size e, as well as the lex-parse of size O(r) and the LZ77-parse of size z, where \(r, z \leqslant e\). As main results, we showed that the above structures can be optimally computed from either the CDAWG for T stored in read-only memory or its self-index version of size e without a text in O(e) worst-case time and words of working space. To obtain the above results, we devised techniques for enumerating a particular subset of suffixes in the lexicographic and text orders using the forward and backward search on the CDAWG by extending the result by Belazzougui et al. in 2015.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The n-th Thue-Morse word is \(\tau _n = \varphi ^n(0)\) for the morphism \(\varphi (0) = 01\) and \(\varphi (1) = 10\).
References
Arimura, H., Inenaga, S., Kobayashi, Y., Nakashima, Y., Sue, M.: Optimally computing compressed indexing arrays based on the compact directed acyclic word graph. CoRR (2023). http://arxiv.org/abs/
Bannai, H., Gawrychowski, P., Inenaga, S., Takeda, M.: Converting SLP to LZ78 in almost linear time. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 38–49. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38905-4_6
Belazzougui, D., Cunial, F.: Representing the suffix tree with the CDAWG. In: CPM 2017. LIPIcs, vol. 78, pp. 7:1–7:13 (2017)
Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raffinot, M.: Composite repetition-aware data structures. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 26–39. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19929-0_3
Blumer, A., Blumer, J., Haussler, D., McConnell, R., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. JACM 34(3), 578–595 (1987)
Brlek, S., Frosini, A., Mancini, I., Pergola, E., Rinaldi, S.: Burrows-wheeler transform of words defined by morphisms. In: Colbourn, C.J., Grossi, R., Pisanti, N. (eds.) IWOCA 2019. LNCS, vol. 11638, pp. 393–404. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25005-8_32
Crochemore, M., Ilie, L.: Computing longest previous factor in linear time and applications. IPL 106(2), 75–80 (2008)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Kärkkäinen, J., Manzini, G., Puglisi, S.J.: Permuted longest-common-prefix array. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp. 181–192. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02441-2_17
Kempa, D., Kociumaka, T.: Resolution of the burrows-wheeler transform conjecture. Commun. ACM 65(6), 91–98 (2022)
Mantaci, S., Restivo, A., Rosone, G., Sciortino, M., Versari, L.: Measuring the clustering effect of BWT via RLE. Theoret. Comput. Sci. 698, 79–87 (2017)
Navarro, G.: Indexing highly repetitive string collections, part ii: Compressed indexes. ACM Comput. Surv. (CSUR) 54(2), 1–32 (2021)
Navarro, G., Ochoa, C., Prezza, N.: On the approximation ratio of ordered parsings. IEEE Trans. Inf. Theory 67(2), 1008–1026 (2020)
Radoszewski, J., Rytter, W.: On the structure of compacted subword graphs of Thue-Morse words and their applications. JDA 11, 15–24 (2012)
Takagi, T., Goto, K., Fujishige, Y., Inenaga, S., Arimura, H.: Linear-size CDAWG: new repetition-aware indexing and grammar compression. In: Fici, G., Sciortino, M., Venturini, R. (eds.) SPIRE 2017. LNCS, vol. 10508, pp. 304–316. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67428-5_26
Acknowledgments
The authors thank the anonymous reviewers for their comments which greatly improved this paper. The first author is also grateful to Hideo Bannai for information on conversion between text indexes, and to Mitsuru Funakoshi for discussion on the sensitivity of text indexes.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Arimura, H., Inenaga, S., Kobayashi, Y., Nakashima, Y., Sue, M. (2023). Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph. In: Nardini, F.M., Pisanti, N., Venturini, R. (eds) String Processing and Information Retrieval. SPIRE 2023. Lecture Notes in Computer Science, vol 14240. Springer, Cham. https://doi.org/10.1007/978-3-031-43980-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-43980-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43979-7
Online ISBN: 978-3-031-43980-3
eBook Packages: Computer ScienceComputer Science (R0)