Skip to main content

Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2023)

Abstract

In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size e for a text T of length n into other text indexing structures for the same text, suitable for highly repetitive texts: the run-length BWT of size r, the irreducible PLCP array of size r, and the quasi-irreducible LPF array of size e, as well as the lex-parse of size O(r) and the LZ77-parse of size z, where \(r, z \leqslant e\). As main results, we showed that the above structures can be optimally computed from either the CDAWG for T stored in read-only memory or its self-index version of size e without a text in O(e) worst-case time and words of working space. To obtain the above results, we devised techniques for enumerating a particular subset of suffixes in the lexicographic and text orders using the forward and backward search on the CDAWG by extending the result by Belazzougui et al. in 2015.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The n-th Thue-Morse word is \(\tau _n = \varphi ^n(0)\) for the morphism \(\varphi (0) = 01\) and \(\varphi (1) = 10\).

References

  1. Arimura, H., Inenaga, S., Kobayashi, Y., Nakashima, Y., Sue, M.: Optimally computing compressed indexing arrays based on the compact directed acyclic word graph. CoRR (2023). http://arxiv.org/abs/

  2. Bannai, H., Gawrychowski, P., Inenaga, S., Takeda, M.: Converting SLP to LZ78 in almost linear time. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 38–49. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38905-4_6

    Chapter  Google Scholar 

  3. Belazzougui, D., Cunial, F.: Representing the suffix tree with the CDAWG. In: CPM 2017. LIPIcs, vol. 78, pp. 7:1–7:13 (2017)

    Google Scholar 

  4. Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raffinot, M.: Composite repetition-aware data structures. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 26–39. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19929-0_3

    Chapter  Google Scholar 

  5. Blumer, A., Blumer, J., Haussler, D., McConnell, R., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. JACM 34(3), 578–595 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  6. Brlek, S., Frosini, A., Mancini, I., Pergola, E., Rinaldi, S.: Burrows-wheeler transform of words defined by morphisms. In: Colbourn, C.J., Grossi, R., Pisanti, N. (eds.) IWOCA 2019. LNCS, vol. 11638, pp. 393–404. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25005-8_32

    Chapter  Google Scholar 

  7. Crochemore, M., Ilie, L.: Computing longest previous factor in linear time and applications. IPL 106(2), 75–80 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  8. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  9. Kärkkäinen, J., Manzini, G., Puglisi, S.J.: Permuted longest-common-prefix array. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp. 181–192. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02441-2_17

    Chapter  Google Scholar 

  10. Kempa, D., Kociumaka, T.: Resolution of the burrows-wheeler transform conjecture. Commun. ACM 65(6), 91–98 (2022)

    Article  Google Scholar 

  11. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M., Versari, L.: Measuring the clustering effect of BWT via RLE. Theoret. Comput. Sci. 698, 79–87 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  12. Navarro, G.: Indexing highly repetitive string collections, part ii: Compressed indexes. ACM Comput. Surv. (CSUR) 54(2), 1–32 (2021)

    Article  Google Scholar 

  13. Navarro, G., Ochoa, C., Prezza, N.: On the approximation ratio of ordered parsings. IEEE Trans. Inf. Theory 67(2), 1008–1026 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  14. Radoszewski, J., Rytter, W.: On the structure of compacted subword graphs of Thue-Morse words and their applications. JDA 11, 15–24 (2012)

    MathSciNet  MATH  Google Scholar 

  15. Takagi, T., Goto, K., Fujishige, Y., Inenaga, S., Arimura, H.: Linear-size CDAWG: new repetition-aware indexing and grammar compression. In: Fici, G., Sciortino, M., Venturini, R. (eds.) SPIRE 2017. LNCS, vol. 10508, pp. 304–316. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67428-5_26

    Chapter  MATH  Google Scholar 

Download references

Acknowledgments

The authors thank the anonymous reviewers for their comments which greatly improved this paper. The first author is also grateful to Hideo Bannai for information on conversion between text indexes, and to Mitsuru Funakoshi for discussion on the sensitivity of text indexes.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroki Arimura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Arimura, H., Inenaga, S., Kobayashi, Y., Nakashima, Y., Sue, M. (2023). Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph. In: Nardini, F.M., Pisanti, N., Venturini, R. (eds) String Processing and Information Retrieval. SPIRE 2023. Lecture Notes in Computer Science, vol 14240. Springer, Cham. https://doi.org/10.1007/978-3-031-43980-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43980-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43979-7

  • Online ISBN: 978-3-031-43980-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics