Skip to main content

New Advances in Rightmost Lempel-Ziv

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2023)

Abstract

The Lempel-Ziv (LZ) 77 factorization of a string is a widely-used algorithmic tool that plays a central role in compression and indexing. For a length-n string over a linearly-sortable alphabet, e.g., \(\varSigma = \{1, \dots , \sigma \}\) with \({\sigma = n^{\mathcal O(1)}}\), it can be computed in \(\mathcal O(n)\) time. It is unknown whether this time can be achieved for the rightmost LZ parsing, where each referencing phrase points to its rightmost previous occurrence. The currently best solution takes \({\mathcal O(n (1 + {\log \sigma }/{\sqrt{\log n})})}\) time (Belazzougui & Puglisi SODA2016). We show that this problem is much easier to solve for the LZ-End factorization (Kreft & Navarro DCC2010), where the rightmost factorization can be obtained in \(\mathcal O(n)\) time for the greedy parsing (with phrases of maximal length), and in \(\mathcal O(n + z \sqrt{\log z})\) time for any LZ-End parsing of z phrases. We also make advances towards a linear time solution for the general case. We show how to solve multiple non-trivial subsets of the phrases of any LZ-like parsing in \(\mathcal O(n)\) time. As a prime example, we can find the rightmost occurrence of all phrases of length \(\varOmega (\log ^{6.66} n / \log ^2 \sigma )\) in \(\mathcal O(n / \log _\sigma n)\) time and space.

Supported by Danish Research Council grant DFF-8021-002498.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amir, A., Landau, G.M., Ukkonen, E.: Online timestamped text indexing. Inf. Process. Lett. 82(5), 253–259 (2002). https://doi.org/10.1016/S0020-0190(01)00275-7

    Article  MathSciNet  MATH  Google Scholar 

  2. Bannai, H., Funakoshi, M., Kurita, K., Nakashima, Y., Seto, K., Uno, T.: Optimal LZ-end parsing is hard. In: Proceedings of the 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023) (2023). https://doi.org/10.4230/LIPIcs.CPM.2023.3

  3. Belazzougui, D., Puglisi, S.J.: Range predecessor and Lempel-Ziv parsing. In: Proceedings of the 27th Annual Symposium on Discrete Algorithms (SODA 2016), pp. 2053–2071. Arlington, VA, USA (2016). https://doi.org/10.1137/1.9781611974331.ch143

  4. Bille, P., Cording, P.H., Fischer, J., Gørtz, I.L.: Lempel-Ziv compression in a sliding window. In: Proceedings of the 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017), pp. 15:1–15:11. Warsaw, Poland (2017). https://doi.org/10.4230/LIPIcs.CPM.2017.15

  5. Chan, T.M., Tsakalidis, K.: Dynamic orthogonal range searching on the ram, revisited. J. Comput. Geom. 9(2), 45–66 (2018). https://doi.org/10.20382/jocg.v9i2a5

    Article  MATH  Google Scholar 

  6. Crochemore, M., Langiu, A., Mignosi, F.: The rightmost equal-cost position problem. In: Proceedings of the 2013 Data Compression Conference (DCC 2013), pp. 421–430. Snowbird, UT, USA (2013). https://doi.org/10.1109/DCC.2013.50

  7. Crochemore, M., Rytter, W.: Efficient parallel algorithms to test square-freeness and factorize strings. Inf. Process. Lett. 38(2), 57–60 (1991). https://doi.org/10.1016/0020-0190(91)90223-5

    Article  MathSciNet  MATH  Google Scholar 

  8. Ellert, J.: Sublinear time Lempel-Ziv (LZ77) factorization. In: Proceedings of the 30th International Symposium on String Processing and Information Retrieval (SPIRE 2023). Pisa, Italy (2023)

    Google Scholar 

  9. Farach, M., Muthukrishnan, S.: Optimal parallel dictionary matching and compression (extended abstract). In: Proceedings of the 7th Annual Symposium on Parallel Algorithms and Architectures (SPAA 1995), pp. 244–253. Santa Barbara, California, USA (1995). https://doi.org/10.1145/215399.215451

  10. Ferragina, P., Nitto, I., Venturini, R.: On the bit-complexity of Lempel-Ziv compression. SIAM J. Comput. 42(4), 1521–1541 (2013). https://doi.org/10.1137/120869511

    Article  MathSciNet  MATH  Google Scholar 

  11. Fischer, J., I, T., Köppl, D.: Lempel Ziv computation in small space (LZ-CISS). In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 172–184. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19929-0_15

    Chapter  Google Scholar 

  12. Fischer, J., Tomohiro, I., Köppl, D., Sadakane, K.: Lempel-Ziv factorization powered by space efficient suffix trees. Algorithmica 80(7), 2048–2081 (2018). https://doi.org/10.1007/s00453-017-0333-1

    Article  MathSciNet  MATH  Google Scholar 

  13. Fredkin, E.: Trie memory. Commun. ACM 3(9), 490–499 (1960). https://doi.org/10.1145/367390.367400

    Article  Google Scholar 

  14. Goto, K., Bannai, H.: Simpler and faster Lempel Ziv factorization. In: Proceedings of the 2013 Data Compression Conference (DCC 2013), pp. 133–142. Snowbird, UT, USA (2013). https://doi.org/10.1109/DCC.2013.21

  15. Goto, K., Bannai, H.: Space efficient linear time Lempel-Ziv factorization for small alphabets. In: Proceedings of the 2014 Data Compression Conference (DCC 2014), pp. 163–172. Snowbird, UT, USA (2014). https://doi.org/10.1109/DCC.2014.62

  16. Hagerup, T.: Sorting and searching on the word RAM. In: Morvan, M., Meinel, C., Krob, D. (eds.) STACS 1998. LNCS, vol. 1373, pp. 366–398. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0028575

    Chapter  Google Scholar 

  17. Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lightweight Lempel-Ziv parsing. In: Bonifaci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933, pp. 139–150. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38527-8_14

    Chapter  Google Scholar 

  18. Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Linear time Lempel-Ziv factorization: simple, fast, small. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 189–200. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38905-4_19

    Chapter  Google Scholar 

  19. Kempa, D., Kociumaka, T.: String synchronizing sets: sublinear-time BWT construction and optimal LCE data structure. In: Proceedings of the 51st Annual Symposium on Theory of Computing (STOC 2019), pp. 756–767. Phoenix, AZ, USA (2019). https://doi.org/10.1145/3313276.3316368

  20. Kempa, D., Kosolobov, D.: LZ-end parsing in linear time. In: Proceedings of the 25th Annual European Symposium on Algorithms (ESA 2017), pp. 53:1–53:14. Vienna, Austria (2017). https://doi.org/10.4230/LIPIcs.ESA.2017.53

  21. Kempa, D., Saha, B.: An upper bound and linear-space queries on the LZ-end parsing. In: Proceedings of the 33rd Annual Symposium on Discrete Algorithms (SODA 2022), pp. 2847–2866. Alexandria, VA, USA (Virtual Conference) (2022). https://doi.org/10.1137/1.9781611977073.111

  22. Kosolobov, D.: Faster lightweight Lempel-Ziv parsing. In: Italiano, G.F., Pighizzini, G., Sannella, D.T. (eds.) MFCS 2015. LNCS, vol. 9235, pp. 432–444. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48054-0_36

    Chapter  Google Scholar 

  23. Kreft, S., Navarro, G.: LZ77-like compression with fast random access. In: Proceedings of the 2010 Data Compression Conference (DCC 2010), pp. 239–248. Snowbird, UT, USA (2010). https://doi.org/10.1109/DCC.2010.29

  24. Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor. Comput. Sci. 483, 115–133 (2013). https://doi.org/10.1016/j.tcs.2012.02.006

    Article  MathSciNet  MATH  Google Scholar 

  25. Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lempel-Ziv parsing in external memory. In: Proceedings of the 2014 Data Compression Conference (DCC 2014), pp. 153–162. Snowbird, UT, USA (2014). https://doi.org/10.1109/DCC.2014.78

  26. Köppl, D., Sadakane, K.: Lempel-Ziv computation in compressed space (LZ-CICS). In: Proceedings of the 2016 Data Compression Conference (DCC 2016), pp. 3–12. Snowbird, UT, USA (2016). https://doi.org/10.1109/DCC.2016.38

  27. Larsson, N.J.: Most recent match queries in on-line suffix trees. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 252–261. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07566-2_26

    Chapter  Google Scholar 

  28. Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory 22(1), 75–81 (1976). https://doi.org/10.1109/TIT.1976.1055501

    Article  MathSciNet  MATH  Google Scholar 

  29. Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993). https://doi.org/10.1137/0222058

    Article  MathSciNet  MATH  Google Scholar 

  30. Naor, M.: String matching with preprocessing of text and pattern. In: Albert, J.L., Monien, B., Artalejo, M.R. (eds.) ICALP 1991. LNCS, vol. 510, pp. 739–750. Springer, Heidelberg (1991). https://doi.org/10.1007/3-540-54233-7_179

    Chapter  MATH  Google Scholar 

  31. Navarro, G.: Compact Data Structures: A Practical Approach. Cambridge University Press, Cambridge (2016). https://doi.org/10.1017/CBO9781316588284

  32. Ohlebusch, E., Gog, S.: Lempel-Ziv factorization revisited. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 15–26. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21458-5_4

    Chapter  Google Scholar 

  33. Okanohara, D., Sadakane, K.: An online algorithm for finding the longest previous factors. In: Halperin, D., Mehlhorn, K. (eds.) ESA 2008. LNCS, vol. 5193, pp. 696–707. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87744-8_58

    Chapter  Google Scholar 

  34. Shun, J.: Parallel Lempel-Ziv factorization, chap. 13. Association for Computing Machinery and Morgan & Claypool (2018). https://doi.org/10.1145/3018787.3018801

  35. Shun, J., Zhao, F.: Practical parallel Lempel-Ziv factorization. In: Proceedings of the 2013 Data Compression Conference (DCC 2013). pp. 123–132. Snowbird, UT, USA (2013). https://doi.org/10.1109/DCC.2013.20

  36. Starikovskaya, T.: Computing Lempel-Ziv factorization online. In: Rovan, B., Sassone, V., Widmayer, P. (eds.) MFCS 2012. LNCS, vol. 7464, pp. 789–799. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32589-2_68

    Chapter  Google Scholar 

  37. Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29(4), 928–951 (1982). https://doi.org/10.1145/322344.322346

    Article  MathSciNet  MATH  Google Scholar 

  38. Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th Annual Symposium on Switching and Automata Theory (SWAT 1973), pp. 1–11. Iowa City, IA, USA (1973). https://doi.org/10.1109/SWAT.1973.13

  39. Yamamoto, J., I, T., Bannai, H., Inenaga, S., Takeda, M.: Faster compact on-line Lempel-Ziv factorization. In: Proceedings of the 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), pp. 675–686. Lyon, France (2014). https://doi.org/10.4230/LIPIcs.STACS.2014.675

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonas Ellert .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ellert, J., Fischer, J., Pedersen, M.R. (2023). New Advances in Rightmost Lempel-Ziv. In: Nardini, F.M., Pisanti, N., Venturini, R. (eds) String Processing and Information Retrieval. SPIRE 2023. Lecture Notes in Computer Science, vol 14240. Springer, Cham. https://doi.org/10.1007/978-3-031-43980-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43980-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43979-7

  • Online ISBN: 978-3-031-43980-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics