Skip to main content
Log in

Near-Optimal Search Time in \(\delta \)-Optimal Space, and Vice Versa

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Two recent lower bounds on the compressibility of repetitive sequences, \(\delta \le \gamma \), have received much attention. It has been shown that a length-n string S over an alphabet of size \(\sigma \) can be represented within the optimal \(O(\delta \log \tfrac{n\log \sigma }{\delta \log n})\) space, and further, that within that space one can find all the occ occurrences in S of any length-m pattern in time \(O(m\log n + occ \log ^\epsilon n)\) for any constant \(\epsilon >0\). Instead, the near-optimal search time \(O(m+({occ+1})\log ^\epsilon n)\) has been achieved only within \(O(\gamma \log \frac{n}{\gamma })\) space. Both results are based on considerably different locally consistent parsing techniques. The question of whether the better search time could be supported within the \(\delta \)-optimal space remained open. In this paper, we prove that both techniques can indeed be combined to obtain the best of both worlds: \(O(m+({occ+1})\log ^\epsilon n)\) search time within \(O(\delta \log \tfrac{n\log \sigma }{\delta \log n})\) space. Moreover, the number of occurrences can be computed in \(O(m+\log ^{2+\epsilon }n)\) time within \(O(\delta \log \tfrac{n\log \sigma }{\delta \log n})\) space. We also show that an extra sublogarithmic factor on top of this space enables optimal \(O(m+occ)\) search time, whereas an extra logarithmic factor enables optimal O(m) counting time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. In this work, we assume that P[1..m] is represented in O(m) space. For small alphabets, the packed setting, where P occupies \(O(\lceil \frac{m\log \sigma }{\log n} \rceil )\) space, could also be considered; see [3, Sec. 2.2]

  2. If \(\delta \log n \le \sqrt{n}\), then \(\delta \log \tfrac{n\log \sigma }{\delta \log \delta } \le \delta \log (n\log \sigma ) \le 2\delta \log (\sqrt{n} \log \sigma ) \le 2 \delta \log \tfrac{n\log \sigma }{\delta \log n}\). Otherwise, \(\log \delta> \frac{1}{2}\log n-\log \log n > \frac{1}{4} \log n\), so \(\delta \log \tfrac{n\log \sigma }{\delta \log \delta } \le \delta \log \tfrac{4n\log \sigma }{\delta \log n}=\delta \log \tfrac{n\log \sigma }{\delta \log n}+ 2\delta \).

  3. The unproductive tests \(u. next = null \) are charged to the primary occurrence v if the label of u is A, or to the first secondary occurrence of u, which exists by the definition of \(u. anc \), otherwise.

References

  1. Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., Iyer, R., Schatz, M.C., Sinha, S., Robinson, G.E.: Big data: astronomical or genomical? PLoS Biol. 13(7), 1002195 (2015). https://doi.org/10.1371/journal.pbio.1002195

    Article  Google Scholar 

  2. Navarro, G.: Indexing highly repetitive string collections, part II: compressed indexes. ACM Comput. Surv. 54(2), 26–12632 (2021). https://doi.org/10.1145/3432999

    Article  Google Scholar 

  3. Navarro, G.: Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. 54(2), 29–12931 (2021). https://doi.org/10.1145/3434399

    Article  Google Scholar 

  4. Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theoret. Comput. Sci. 483, 115–133 (2013). https://doi.org/10.1016/j.tcs.2012.02.006

    Article  MathSciNet  Google Scholar 

  5. Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, pp. 827–840 (2018). https://doi.org/10.1145/3188745.3188814

  6. Christiansen, A.R., Ettienne, M.B., Kociumaka, T., Navarro, G., Prezza, N.: Optimal-time dictionary-compressed indexes. ACM Trans. Algorithms 17(1), 8–1839 (2021). https://doi.org/10.1145/3426473

    Article  MathSciNet  Google Scholar 

  7. Kociumaka, T., Navarro, G., Prezza, N.: Towards a definitive compressibility measure for repetitive sequences. IEEE Trans. Inf. Theory (2022). https://doi.org/10.1109/TIT.2022.3224382

    Article  Google Scholar 

  8. Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory 22(1), 75–81 (1976). https://doi.org/10.1109/TIT.1976.1055501

    Article  MathSciNet  Google Scholar 

  9. Claude, F., Navarro, G.: Improved grammar-based compressed indexes. In: 19th International Symposium on String Processing and Information Retrieval, SPIRE 2012. LNCS, vol. 7608, pp. 180–192 (2012). https://doi.org/10.1007/978-3-642-34109-0_19

  10. Claude, F., Navarro, G., Pacheco, A.: Grammar-compressed indexes with logarithmic search time. J. Comput. Syst. Sci. 118, 53–74 (2021). https://doi.org/10.1016/j.jcss.2020.12.001

    Article  MathSciNet  Google Scholar 

  11. Mehlhorn, K., Sundar, R., Uhrig, C.: Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17(2), 183–198 (1997). https://doi.org/10.1007/BF02522825

    Article  MathSciNet  Google Scholar 

  12. Jeż, A.: A really simple approximation of smallest grammar. Theoret. Comput. Sci. 616, 141–150 (2016). https://doi.org/10.1016/j.tcs.2015.12.032

    Article  MathSciNet  Google Scholar 

  13. Kociumaka, T., Navarro, G., Olivares, F.: Near-optimal search time in \(\delta \)-optimal space. In: 15th Latin American Symposium on Theoretical Informatics, LATIN 2022. LNCS, vol. 13568, pp. 88–103 (2022). https://doi.org/10.1007/978-3-031-20624-5_6

  14. Batu, T., Sahinalp, S.C.: Locally consistent parsing and applications to approximate string comparisons. In: 9th International Conference on Developments in Language Theory, DLT 2005. LNCS, vol. 3572, pp. 22–35 (2005). https://doi.org/10.1007/11505877_3

  15. Cole, R., Vishkin, U.: Deterministic coin tossing and accelerating cascades: Micro and macro techniques for designing parallel algorithms. In: 18th Annual ACM Symposium on Theory of Computing, STOC 1986, pp. 206–219 (1986). https://doi.org/10.1145/12130.12151

  16. Raskhodnikova, S., Ron, D., Rubinfeld, R., Smith, A.D.: Sublinear algorithms for approximating string compressibility. Algorithmica 65(3), 685–709 (2013). https://doi.org/10.1007/s00453-012-9618-6

    Article  MathSciNet  Google Scholar 

  17. Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: Internal pattern matching queries in text and applications (2023) arXiv:1311.6235v5

  18. Birenzwige, O., Golan, S., Porat, E.: Locally consistent parsing for text indexing in small space. In: 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, pp. 607–626 (2020). https://doi.org/10.1137/1.9781611975994.37

  19. Kempa, D., Kociumaka, T.: Dynamic suffix array with polylogarithmic queries and updates. In: 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022, pp. 1657–1670 (2022). https://doi.org/10.1145/3519935.3520061

  20. Sahinalp, S.C., Vishkin, U.: On a parallel-algorithms method for string matching problems (overview). In: 2nd Italian Conference on Algorithms and Complexity, CIAC 1994. LNCS, vol. 778, pp. 22–32 (1994). https://doi.org/10.1007/3-540-57811-0_3

  21. Chan, T.M., Larsen, K.G., Pătraşcu, M.: Orthogonal range searching on the RAM, revisited. SoCG ’11, pp. 1–10. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/1998196.1998198

  22. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987). https://doi.org/10.1147/rd.312.0249

    Article  MathSciNet  Google Scholar 

  23. Navarro, G.: Computing MEMs on repetitive text collections. In: 34th Annual Symposium on Combinatorial Pattern Matching, CPM 202. LIPIcs, vol. 259, pp. 24–12417 (2023)https://doi.org/10.4230/LIPIcs.CPM.2023.24

  24. Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with \(O(1)\) worst case access time. J. ACM 31(3), 538–544 (1984). https://doi.org/10.1145/828.1884

    Article  MathSciNet  Google Scholar 

  25. Alstrup, S., Brodal, G.S., Rauhe, T.: New data structures for orthogonal range searching. Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 198–207 (2000). https://doi.org/10.1109/SFCS.2000.892088

  26. Fine, N.J., Wilf, H.S.: Uniqueness theorems for periodic functions. Proc. Am. Math. Soc. 16(1), 109–114 (1965). https://doi.org/10.1090/s0002-9939-1965-0174934-9

    Article  MathSciNet  Google Scholar 

  27. Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. ACM Trans. Algorithms 10(4), 23–12319 (2014). https://doi.org/10.1145/2635816

    Article  MathSciNet  Google Scholar 

  28. Kempa, D., Kociumaka, T.: Collapsing the hierarchy of compressed data structures: Suffix arrays in optimal compressed space. In: Proceedings of 64th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (2023) https://doi.org/10.48550/arXiv.2308.03635

Download references

Author information

Authors and Affiliations

Authors

Contributions

T.K. and G.N. defined the main idea, i.e., to verify whether restricted block compression would yield an RLSLP of size O(\(\delta \) log(n/\(\delta \))), improving upon the O(\(\gamma \) log(n/\(\gamma \))) space achieved using the ordinary block compression (without pausing), that could be used for pattern matching within the times of the larger index. F.O. worked on this problem under the supervision of G.N. and they prepared the first draft of the manuscript. In the conference version, T.K. contributed a few most subtle proofs, including the bound on the expected total size of all productions, and simplified the construction of the set M(P). While preparing the journal version, F.O. and G.N. developed the counting version of the O(\(\delta \) log(n/\(\delta \)))-size index and the slightly larger variants allowing for optimal counting and reporting. T.K. refined the analysis of the RLSLP size to achieve an O(\(\delta \) log(n log \(\sigma \)/\(\delta \) log n)) bound and adapted all the proofs so the index sizes decreased accordingly.

Corresponding author

Correspondence to Francisco Olivares.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject, matter, or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Funded in part by Basal Funds FB0001, Fondecyt Grant 1-200038, and Ph.D Scholarship 21210579, ANID, Chile.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kociumaka, T., Navarro, G. & Olivares, F. Near-Optimal Search Time in \(\delta \)-Optimal Space, and Vice Versa. Algorithmica 86, 1031–1056 (2024). https://doi.org/10.1007/s00453-023-01186-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-023-01186-0

Keywords

Navigation