Near-Optimal Search Time in $$\delta $$ -Optimal Space, and Vice Versa

Kociumaka, Tomasz; Navarro, Gonzalo; Olivares, Francisco

doi:10.1007/s00453-023-01186-0

Near-Optimal Search Time in $\delta $-Optimal Space, and Vice Versa

Published: 06 November 2023

Volume 86, pages 1031–1056, (2024)
Cite this article

Algorithmica Aims and scope Submit manuscript

Tomasz Kociumaka¹,
Gonzalo Navarro² &
Francisco Olivares²

104 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Two recent lower bounds on the compressibility of repetitive sequences, $\delta \le \gamma $, have received much attention. It has been shown that a length-n string S over an alphabet of size $\sigma $ can be represented within the optimal $O(\delta \log \tfrac{n\log \sigma }{\delta \log n})$ space, and further, that within that space one can find all the occ occurrences in S of any length-m pattern in time $O(m\log n + occ \log ^\epsilon n)$ for any constant $\epsilon >0$. Instead, the near-optimal search time $O(m+({occ+1})\log ^\epsilon n)$ has been achieved only within $O(\gamma \log \frac{n}{\gamma })$ space. Both results are based on considerably different locally consistent parsing techniques. The question of whether the better search time could be supported within the $\delta $-optimal space remained open. In this paper, we prove that both techniques can indeed be combined to obtain the best of both worlds: $O(m+({occ+1})\log ^\epsilon n)$ search time within $O(\delta \log \tfrac{n\log \sigma }{\delta \log n})$ space. Moreover, the number of occurrences can be computed in $O(m+\log ^{2+\epsilon }n)$ time within $O(\delta \log \tfrac{n\log \sigma }{\delta \log n})$ space. We also show that an extra sublogarithmic factor on top of this space enables optimal $O(m+occ)$ search time, whereas an extra logarithmic factor enables optimal O(m) counting time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Near-Optimal Search Time in $$\delta $$ -Optimal Space

Lempel–Ziv-Like Parsing in Small Space

Article 25 May 2020

Substring Complexities on Run-Length Compressed Strings

Notes

In this work, we assume that P[1..m] is represented in O(m) space. For small alphabets, the packed setting, where P occupies $O(\lceil \frac{m\log \sigma }{\log n} \rceil )$ space, could also be considered; see [3, Sec. 2.2]
If $\delta \log n \le \sqrt{n}$, then $\delta \log \tfrac{n\log \sigma }{\delta \log \delta } \le \delta \log (n\log \sigma ) \le 2\delta \log (\sqrt{n} \log \sigma ) \le 2 \delta \log \tfrac{n\log \sigma }{\delta \log n}$. Otherwise, $\log \delta> \frac{1}{2}\log n-\log \log n > \frac{1}{4} \log n$, so $\delta \log \tfrac{n\log \sigma }{\delta \log \delta } \le \delta \log \tfrac{4n\log \sigma }{\delta \log n}=\delta \log \tfrac{n\log \sigma }{\delta \log n}+ 2\delta $.
The unproductive tests $u. next = null $ are charged to the primary occurrence v if the label of u is A, or to the first secondary occurrence of u, which exists by the definition of $u. anc $, otherwise.

References

Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., Iyer, R., Schatz, M.C., Sinha, S., Robinson, G.E.: Big data: astronomical or genomical? PLoS Biol. 13(7), 1002195 (2015). https://doi.org/10.1371/journal.pbio.1002195
Article Google Scholar
Navarro, G.: Indexing highly repetitive string collections, part II: compressed indexes. ACM Comput. Surv. 54(2), 26–12632 (2021). https://doi.org/10.1145/3432999
Article Google Scholar
Navarro, G.: Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. 54(2), 29–12931 (2021). https://doi.org/10.1145/3434399
Article Google Scholar
Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theoret. Comput. Sci. 483, 115–133 (2013). https://doi.org/10.1016/j.tcs.2012.02.006
Article MathSciNet Google Scholar
Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, pp. 827–840 (2018). https://doi.org/10.1145/3188745.3188814
Christiansen, A.R., Ettienne, M.B., Kociumaka, T., Navarro, G., Prezza, N.: Optimal-time dictionary-compressed indexes. ACM Trans. Algorithms 17(1), 8–1839 (2021). https://doi.org/10.1145/3426473
Article MathSciNet Google Scholar
Kociumaka, T., Navarro, G., Prezza, N.: Towards a definitive compressibility measure for repetitive sequences. IEEE Trans. Inf. Theory (2022). https://doi.org/10.1109/TIT.2022.3224382
Article Google Scholar
Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory 22(1), 75–81 (1976). https://doi.org/10.1109/TIT.1976.1055501
Article MathSciNet Google Scholar
Claude, F., Navarro, G.: Improved grammar-based compressed indexes. In: 19th International Symposium on String Processing and Information Retrieval, SPIRE 2012. LNCS, vol. 7608, pp. 180–192 (2012). https://doi.org/10.1007/978-3-642-34109-0_19
Claude, F., Navarro, G., Pacheco, A.: Grammar-compressed indexes with logarithmic search time. J. Comput. Syst. Sci. 118, 53–74 (2021). https://doi.org/10.1016/j.jcss.2020.12.001
Article MathSciNet Google Scholar
Mehlhorn, K., Sundar, R., Uhrig, C.: Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17(2), 183–198 (1997). https://doi.org/10.1007/BF02522825
Article MathSciNet Google Scholar
Jeż, A.: A really simple approximation of smallest grammar. Theoret. Comput. Sci. 616, 141–150 (2016). https://doi.org/10.1016/j.tcs.2015.12.032
Article MathSciNet Google Scholar
Kociumaka, T., Navarro, G., Olivares, F.: Near-optimal search time in $\delta $-optimal space. In: 15th Latin American Symposium on Theoretical Informatics, LATIN 2022. LNCS, vol. 13568, pp. 88–103 (2022). https://doi.org/10.1007/978-3-031-20624-5_6
Batu, T., Sahinalp, S.C.: Locally consistent parsing and applications to approximate string comparisons. In: 9th International Conference on Developments in Language Theory, DLT 2005. LNCS, vol. 3572, pp. 22–35 (2005). https://doi.org/10.1007/11505877_3
Cole, R., Vishkin, U.: Deterministic coin tossing and accelerating cascades: Micro and macro techniques for designing parallel algorithms. In: 18th Annual ACM Symposium on Theory of Computing, STOC 1986, pp. 206–219 (1986). https://doi.org/10.1145/12130.12151
Raskhodnikova, S., Ron, D., Rubinfeld, R., Smith, A.D.: Sublinear algorithms for approximating string compressibility. Algorithmica 65(3), 685–709 (2013). https://doi.org/10.1007/s00453-012-9618-6
Article MathSciNet Google Scholar
Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: Internal pattern matching queries in text and applications (2023) arXiv:1311.6235v5
Birenzwige, O., Golan, S., Porat, E.: Locally consistent parsing for text indexing in small space. In: 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, pp. 607–626 (2020). https://doi.org/10.1137/1.9781611975994.37
Kempa, D., Kociumaka, T.: Dynamic suffix array with polylogarithmic queries and updates. In: 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022, pp. 1657–1670 (2022). https://doi.org/10.1145/3519935.3520061
Sahinalp, S.C., Vishkin, U.: On a parallel-algorithms method for string matching problems (overview). In: 2nd Italian Conference on Algorithms and Complexity, CIAC 1994. LNCS, vol. 778, pp. 22–32 (1994). https://doi.org/10.1007/3-540-57811-0_3
Chan, T.M., Larsen, K.G., Pătraşcu, M.: Orthogonal range searching on the RAM, revisited. SoCG ’11, pp. 1–10. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/1998196.1998198
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987). https://doi.org/10.1147/rd.312.0249
Article MathSciNet Google Scholar
Navarro, G.: Computing MEMs on repetitive text collections. In: 34th Annual Symposium on Combinatorial Pattern Matching, CPM 202. LIPIcs, vol. 259, pp. 24–12417 (2023)https://doi.org/10.4230/LIPIcs.CPM.2023.24
Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with $O(1)$ worst case access time. J. ACM 31(3), 538–544 (1984). https://doi.org/10.1145/828.1884
Article MathSciNet Google Scholar
Alstrup, S., Brodal, G.S., Rauhe, T.: New data structures for orthogonal range searching. Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 198–207 (2000). https://doi.org/10.1109/SFCS.2000.892088
Fine, N.J., Wilf, H.S.: Uniqueness theorems for periodic functions. Proc. Am. Math. Soc. 16(1), 109–114 (1965). https://doi.org/10.1090/s0002-9939-1965-0174934-9
Article MathSciNet Google Scholar
Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. ACM Trans. Algorithms 10(4), 23–12319 (2014). https://doi.org/10.1145/2635816
Article MathSciNet Google Scholar
Kempa, D., Kociumaka, T.: Collapsing the hierarchy of compressed data structures: Suffix arrays in optimal compressed space. In: Proceedings of 64th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (2023) https://doi.org/10.48550/arXiv.2308.03635

Download references

Author information

Authors and Affiliations

Max Planck Institute for Informatics, Saarbrücken, Germany
Tomasz Kociumaka
Department of Computer Science, CeBiB — Centre for Biotechnology and Bioengineering, University of Chile, Santiago, Chile
Gonzalo Navarro & Francisco Olivares

Authors

Tomasz Kociumaka
View author publications
You can also search for this author in PubMed Google Scholar
Gonzalo Navarro
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Olivares
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.K. and G.N. defined the main idea, i.e., to verify whether restricted block compression would yield an RLSLP of size O($\delta $ log(n/$\delta $)), improving upon the O($\gamma $ log(n/$\gamma $)) space achieved using the ordinary block compression (without pausing), that could be used for pattern matching within the times of the larger index. F.O. worked on this problem under the supervision of G.N. and they prepared the first draft of the manuscript. In the conference version, T.K. contributed a few most subtle proofs, including the bound on the expected total size of all productions, and simplified the construction of the set M(P). While preparing the journal version, F.O. and G.N. developed the counting version of the O($\delta $ log(n/$\delta $))-size index and the slightly larger variants allowing for optimal counting and reporting. T.K. refined the analysis of the RLSLP size to achieve an O($\delta $ log(n log $\sigma $/$\delta $ log n)) bound and adapted all the proofs so the index sizes decreased accordingly.

Corresponding author

Correspondence to Francisco Olivares.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject, matter, or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Funded in part by Basal Funds FB0001, Fondecyt Grant 1-200038, and Ph.D Scholarship 21210579, ANID, Chile.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kociumaka, T., Navarro, G. & Olivares, F. Near-Optimal Search Time in $\delta $-Optimal Space, and Vice Versa. Algorithmica 86, 1031–1056 (2024). https://doi.org/10.1007/s00453-023-01186-0

Download citation

Received: 25 February 2023
Accepted: 14 October 2023
Published: 06 November 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00453-023-01186-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Near-Optimal Search Time in \(\delta \)-Optimal Space, and Vice Versa

Abstract

Access this article

Similar content being viewed by others

Near-Optimal Search Time in $$\delta $$ -Optimal Space

Lempel–Ziv-Like Parsing in Small Space

Substring Complexities on Run-Length Compressed Strings

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Near-Optimal Search Time in \(\delta \)-Optimal Space, and Vice Versa

Abstract

Access this article

Similar content being viewed by others

Near-Optimal Search Time in $$\delta $$ -Optimal Space

Lempel–Ziv-Like Parsing in Small Space

Substring Complexities on Run-Length Compressed Strings

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation