Abstract
The Lempel-Ziv (LZ) 77 factorization of a string is a widely-used algorithmic tool that plays a central role in compression and indexing. For a length-n string over a linearly-sortable alphabet, e.g., \(\varSigma = \{1, \dots , \sigma \}\) with \({\sigma = n^{\mathcal O(1)}}\), it can be computed in \(\mathcal O(n)\) time. It is unknown whether this time can be achieved for the rightmost LZ parsing, where each referencing phrase points to its rightmost previous occurrence. The currently best solution takes \({\mathcal O(n (1 + {\log \sigma }/{\sqrt{\log n})})}\) time (Belazzougui & Puglisi SODA2016). We show that this problem is much easier to solve for the LZ-End factorization (Kreft & Navarro DCC2010), where the rightmost factorization can be obtained in \(\mathcal O(n)\) time for the greedy parsing (with phrases of maximal length), and in \(\mathcal O(n + z \sqrt{\log z})\) time for any LZ-End parsing of z phrases. We also make advances towards a linear time solution for the general case. We show how to solve multiple non-trivial subsets of the phrases of any LZ-like parsing in \(\mathcal O(n)\) time. As a prime example, we can find the rightmost occurrence of all phrases of length \(\varOmega (\log ^{6.66} n / \log ^2 \sigma )\) in \(\mathcal O(n / \log _\sigma n)\) time and space.
Supported by Danish Research Council grant DFF-8021-002498.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amir, A., Landau, G.M., Ukkonen, E.: Online timestamped text indexing. Inf. Process. Lett. 82(5), 253–259 (2002). https://doi.org/10.1016/S0020-0190(01)00275-7
Bannai, H., Funakoshi, M., Kurita, K., Nakashima, Y., Seto, K., Uno, T.: Optimal LZ-end parsing is hard. In: Proceedings of the 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023) (2023). https://doi.org/10.4230/LIPIcs.CPM.2023.3
Belazzougui, D., Puglisi, S.J.: Range predecessor and Lempel-Ziv parsing. In: Proceedings of the 27th Annual Symposium on Discrete Algorithms (SODA 2016), pp. 2053–2071. Arlington, VA, USA (2016). https://doi.org/10.1137/1.9781611974331.ch143
Bille, P., Cording, P.H., Fischer, J., Gørtz, I.L.: Lempel-Ziv compression in a sliding window. In: Proceedings of the 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017), pp. 15:1–15:11. Warsaw, Poland (2017). https://doi.org/10.4230/LIPIcs.CPM.2017.15
Chan, T.M., Tsakalidis, K.: Dynamic orthogonal range searching on the ram, revisited. J. Comput. Geom. 9(2), 45–66 (2018). https://doi.org/10.20382/jocg.v9i2a5
Crochemore, M., Langiu, A., Mignosi, F.: The rightmost equal-cost position problem. In: Proceedings of the 2013 Data Compression Conference (DCC 2013), pp. 421–430. Snowbird, UT, USA (2013). https://doi.org/10.1109/DCC.2013.50
Crochemore, M., Rytter, W.: Efficient parallel algorithms to test square-freeness and factorize strings. Inf. Process. Lett. 38(2), 57–60 (1991). https://doi.org/10.1016/0020-0190(91)90223-5
Ellert, J.: Sublinear time Lempel-Ziv (LZ77) factorization. In: Proceedings of the 30th International Symposium on String Processing and Information Retrieval (SPIRE 2023). Pisa, Italy (2023)
Farach, M., Muthukrishnan, S.: Optimal parallel dictionary matching and compression (extended abstract). In: Proceedings of the 7th Annual Symposium on Parallel Algorithms and Architectures (SPAA 1995), pp. 244–253. Santa Barbara, California, USA (1995). https://doi.org/10.1145/215399.215451
Ferragina, P., Nitto, I., Venturini, R.: On the bit-complexity of Lempel-Ziv compression. SIAM J. Comput. 42(4), 1521–1541 (2013). https://doi.org/10.1137/120869511
Fischer, J., I, T., Köppl, D.: Lempel Ziv computation in small space (LZ-CISS). In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 172–184. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19929-0_15
Fischer, J., Tomohiro, I., Köppl, D., Sadakane, K.: Lempel-Ziv factorization powered by space efficient suffix trees. Algorithmica 80(7), 2048–2081 (2018). https://doi.org/10.1007/s00453-017-0333-1
Fredkin, E.: Trie memory. Commun. ACM 3(9), 490–499 (1960). https://doi.org/10.1145/367390.367400
Goto, K., Bannai, H.: Simpler and faster Lempel Ziv factorization. In: Proceedings of the 2013 Data Compression Conference (DCC 2013), pp. 133–142. Snowbird, UT, USA (2013). https://doi.org/10.1109/DCC.2013.21
Goto, K., Bannai, H.: Space efficient linear time Lempel-Ziv factorization for small alphabets. In: Proceedings of the 2014 Data Compression Conference (DCC 2014), pp. 163–172. Snowbird, UT, USA (2014). https://doi.org/10.1109/DCC.2014.62
Hagerup, T.: Sorting and searching on the word RAM. In: Morvan, M., Meinel, C., Krob, D. (eds.) STACS 1998. LNCS, vol. 1373, pp. 366–398. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0028575
Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lightweight Lempel-Ziv parsing. In: Bonifaci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933, pp. 139–150. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38527-8_14
Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Linear time Lempel-Ziv factorization: simple, fast, small. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 189–200. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38905-4_19
Kempa, D., Kociumaka, T.: String synchronizing sets: sublinear-time BWT construction and optimal LCE data structure. In: Proceedings of the 51st Annual Symposium on Theory of Computing (STOC 2019), pp. 756–767. Phoenix, AZ, USA (2019). https://doi.org/10.1145/3313276.3316368
Kempa, D., Kosolobov, D.: LZ-end parsing in linear time. In: Proceedings of the 25th Annual European Symposium on Algorithms (ESA 2017), pp. 53:1–53:14. Vienna, Austria (2017). https://doi.org/10.4230/LIPIcs.ESA.2017.53
Kempa, D., Saha, B.: An upper bound and linear-space queries on the LZ-end parsing. In: Proceedings of the 33rd Annual Symposium on Discrete Algorithms (SODA 2022), pp. 2847–2866. Alexandria, VA, USA (Virtual Conference) (2022). https://doi.org/10.1137/1.9781611977073.111
Kosolobov, D.: Faster lightweight Lempel-Ziv parsing. In: Italiano, G.F., Pighizzini, G., Sannella, D.T. (eds.) MFCS 2015. LNCS, vol. 9235, pp. 432–444. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48054-0_36
Kreft, S., Navarro, G.: LZ77-like compression with fast random access. In: Proceedings of the 2010 Data Compression Conference (DCC 2010), pp. 239–248. Snowbird, UT, USA (2010). https://doi.org/10.1109/DCC.2010.29
Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor. Comput. Sci. 483, 115–133 (2013). https://doi.org/10.1016/j.tcs.2012.02.006
Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lempel-Ziv parsing in external memory. In: Proceedings of the 2014 Data Compression Conference (DCC 2014), pp. 153–162. Snowbird, UT, USA (2014). https://doi.org/10.1109/DCC.2014.78
Köppl, D., Sadakane, K.: Lempel-Ziv computation in compressed space (LZ-CICS). In: Proceedings of the 2016 Data Compression Conference (DCC 2016), pp. 3–12. Snowbird, UT, USA (2016). https://doi.org/10.1109/DCC.2016.38
Larsson, N.J.: Most recent match queries in on-line suffix trees. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 252–261. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07566-2_26
Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory 22(1), 75–81 (1976). https://doi.org/10.1109/TIT.1976.1055501
Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993). https://doi.org/10.1137/0222058
Naor, M.: String matching with preprocessing of text and pattern. In: Albert, J.L., Monien, B., Artalejo, M.R. (eds.) ICALP 1991. LNCS, vol. 510, pp. 739–750. Springer, Heidelberg (1991). https://doi.org/10.1007/3-540-54233-7_179
Navarro, G.: Compact Data Structures: A Practical Approach. Cambridge University Press, Cambridge (2016). https://doi.org/10.1017/CBO9781316588284
Ohlebusch, E., Gog, S.: Lempel-Ziv factorization revisited. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 15–26. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21458-5_4
Okanohara, D., Sadakane, K.: An online algorithm for finding the longest previous factors. In: Halperin, D., Mehlhorn, K. (eds.) ESA 2008. LNCS, vol. 5193, pp. 696–707. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87744-8_58
Shun, J.: Parallel Lempel-Ziv factorization, chap. 13. Association for Computing Machinery and Morgan & Claypool (2018). https://doi.org/10.1145/3018787.3018801
Shun, J., Zhao, F.: Practical parallel Lempel-Ziv factorization. In: Proceedings of the 2013 Data Compression Conference (DCC 2013). pp. 123–132. Snowbird, UT, USA (2013). https://doi.org/10.1109/DCC.2013.20
Starikovskaya, T.: Computing Lempel-Ziv factorization online. In: Rovan, B., Sassone, V., Widmayer, P. (eds.) MFCS 2012. LNCS, vol. 7464, pp. 789–799. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32589-2_68
Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29(4), 928–951 (1982). https://doi.org/10.1145/322344.322346
Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th Annual Symposium on Switching and Automata Theory (SWAT 1973), pp. 1–11. Iowa City, IA, USA (1973). https://doi.org/10.1109/SWAT.1973.13
Yamamoto, J., I, T., Bannai, H., Inenaga, S., Takeda, M.: Faster compact on-line Lempel-Ziv factorization. In: Proceedings of the 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), pp. 675–686. Lyon, France (2014). https://doi.org/10.4230/LIPIcs.STACS.2014.675
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ellert, J., Fischer, J., Pedersen, M.R. (2023). New Advances in Rightmost Lempel-Ziv. In: Nardini, F.M., Pisanti, N., Venturini, R. (eds) String Processing and Information Retrieval. SPIRE 2023. Lecture Notes in Computer Science, vol 14240. Springer, Cham. https://doi.org/10.1007/978-3-031-43980-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-43980-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43979-7
Online ISBN: 978-3-031-43980-3
eBook Packages: Computer ScienceComputer Science (R0)