Advertisement

Algorithmica

, Volume 80, Issue 7, pp 2048–2081 | Cite as

Lempel–Ziv Factorization Powered by Space Efficient Suffix Trees

  • Johannes Fischer
  • Tomohiro I
  • Dominik Köppl
  • Kunihiko Sadakane
Article
Part of the following topical collections:
  1. Special Issue on Compact Data Structures

Abstract

We show that both the Lempel–Ziv-77 and the Lempel–Ziv-78 factorization of a text of length n on an integer alphabet of size \(\sigma \) can be computed in \(\mathop {}\mathopen {}\mathcal {O}\mathopen {}\left( n\right) \) time with either \(\mathop {}\mathopen {}\mathcal {O}\mathopen {}\left( n \lg \sigma \right) \) bits of working space, or \((1+\epsilon ) n \lg n + \mathop {}\mathopen {}\mathcal {O}\mathopen {}\left( n\right) \) bits (for a constant \(\epsilon >0\)) of working space (including the space for the output, but not the text).

Keywords

Lempel–Ziv Lossless compression Succinct suffix trees 

Notes

Acknowledgements

We thank the anonymous reviewers for their careful reading of our manuscript and their insightful comments and suggestions. We are especially grateful for the reviewer pointing out a simplification of our original solution on how to store the exploration counters for the LZ78 factorizations (Sect. 4.1). Further, we are grateful to Sean Tohidi, who spell-checked the initial submission of this paper during his DAAD RISE internship at TU Dortmund. This research was supported by CREST, JST.

References

  1. 1.
    Amir, A., Farach, M., Idury, R.M., Poutré, J.A.L., Schäffer, A.A.: Improved dynamic dictionary matching. Inf. Comput. 119(2), 258–282 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Belazzougui, D.: Linear time construction of compressed text indices in compact space. In: Proceedings of the STOC, pp. 148–193. ACM (2014)Google Scholar
  3. 3.
    Belazzougui, D., Puglisi, S.J.: Range predecessor and Lempel–Ziv parsing. In: Proceedings of the SODA, pp. 2053–2071. ACM/SIAM(2016)Google Scholar
  4. 4.
    Belazzougui, D., Mäkinen, V., Valenzuela, D.: Compressed suffix array. In: Encyclopedia of Algorithms, pp. 386–390. Springer (2016)Google Scholar
  5. 5.
    Benoit, D., Demaine, E.D., Munro, J.I., Raman, R., Raman, V., Rao, S.S.: Representing trees of higher degree. Algorithmica 43(4), 275–292 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Clark, D.R.: Compact Pat Trees. Ph.D. Thesis. University of Waterloo (1996)Google Scholar
  7. 7.
    Crochemore, M.: Transducers and repetitions. Theor. Comput. Sci. 45(1), 63–86 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Crochemore, M., Landau, G.M., Ziv-Ukelson, M.: A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. Comput. 32(6), 1654–1673 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Duval, J., Kolpakov, R., Kucherov, G., Lecroq, T., Lefebvre, A.: Linear-time computation of local periods. Theor. Comput. Sci. 326(1–3), 229–240 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    El-Zein, H., Munro, J.I., Robertson, M.: Raising permutations to powers in place. In: Proceedings of the ISAAC, volume 64 of LIPIcs, pp. 29:1–29:12. Schloss Dagstuhl (2016)Google Scholar
  11. 11.
    Farach, M.: Optimal suffix tree construction with large alphabets. In: Foundations of Computer Science, pp. 137–143. IEEE Computer Society (1997)Google Scholar
  12. 12.
    Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Fischer, J., Gawrychowski, P.: Alphabet-dependent string searching with wexponential search trees. In: Proceedings of the CPM, volume 9133 of LNCS, pp. 160–171. Springer (2015)Google Scholar
  14. 14.
    Fischer, J., Heun, V.: Space efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Fischer, J., Heun, V.: Space efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Franceschini, G., Muthukrishnan, S., Pǎtraşcu, M.: Radix sorting with no extra space. In: Proceedings of the ESA, volume 4698 of LNCS, pp. 194–205. Springer (2007)Google Scholar
  17. 17.
    Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Proceedings of the LATA, volume 7183 of LNCS, pp. 240–251. Springer (2012)Google Scholar
  18. 18.
    Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Proceedings of the Latin, 8392 of LNCS, pp. 731–742. Springer (2014)Google Scholar
  19. 19.
    Goto, K.: Optimal time and space construction of suffix arrays and LCP arrays for integer alphabets. ArXiv CoRR, arXiv:1703.01009 (2017)
  20. 20.
    Goto, K., Bannai, H.: Simpler and faster Lempel Ziv factorization. In: Proceedings of the DCC, pp. 133–142. IEEE Computer Society (2013)Google Scholar
  21. 21.
    Goto, K., Bannai, H.: Space efficient linear time Lempel–Ziv factorization for small alphabets. In: Proceedings of the DCC, pp. 163–172. IEEE Computer Society (2014)Google Scholar
  22. 22.
    Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci. 69(4), 525–546 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Hon, W.-K., Sadakane, K., Sung, W.-K.: Breaking a time-and-space barrier in constructing full-text indices. In: Proceedings of the FOCS, pp. 251–260. IEEE Computer Society (2003)Google Scholar
  25. 25.
    Jacobson, G.J.: Space-efficient static trees and graphs. In: Proceedings of the FOCS, pp. 549–554. IEEE Computer Society (1989)Google Scholar
  26. 26.
    Jansson, J., Sadakane, K., Sung, W.-K.: Ultra-succinct representation of ordered trees with applications. J. Comput. Syst. Sci. 78(2), 619–631 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Jansson, J., Sadakane, K., Sung, W.-K.: Linked dynamic tries with applications to LZ-compression in sublinear time and space. Algorithmica 71(4), 969–988 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Kärkkäinen, J., Sutinen, E.: Lempel–Ziv index for q-grams. Algorithmica 21(1), 137–154 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Kärkkäinen, J., Ukkonen, E.: Lempel–Ziv parsing and sublinear-size index structures for string matching. In: South American Workshop on String Processing (WSP), pp. 141–155. Carleton University Press (1996)Google Scholar
  30. 30.
    Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53(6), 1–19 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Linear time Lempel–Ziv factorization: simple, fast, small. In: Proceedings of the CPM, volume 7922 of LNCS, pp. 189–200. Springer (2013)Google Scholar
  32. 32.
    Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lightweight Lempel–Ziv parsing. In: Proceedings of the SEA, volume 7933 of LNCS, pp. 139–150. Springer (2013)Google Scholar
  33. 33.
    Kempa, D., Puglisi, S.J.: Lempel–Ziv factorization: simple, fast, practical. In: Proceedings of the ALENEX, pp. 103–112. SIAM (2013)Google Scholar
  34. 34.
    Kociumaka, T., Kubica, M., Radoszewski, J., Rytter, W., Walen, T.: A linear time algorithm for seeds computation. In: Proceedings of the SODA, pp. 1095–1112. ACM/SIAM (2012)Google Scholar
  35. 35.
    Kolpakov, R.M., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: Proceedings of the FOCS, pp. 596–604 (1999)Google Scholar
  36. 36.
    Kolpakov, R.M., Kucherov, G.: Finding repeats with fixed gap. In: Proceedings of the SPIRE, pp. 162–168. IEEE Computer Society (2000)Google Scholar
  37. 37.
    Köppl, D., Sadakane, K.: Lempel–Ziv computation in compressed space (LZ-CICS). In: Proceedings of the DCC, pp. 3–12. IEEE Computer Society (2016)Google Scholar
  38. 38.
    Li, M., Sleep, R.: An LZ78 based string kernel. In: Proceedings of the ADMA, volume 3584 of LNCS, pp. 678–689. Springer (2005)Google Scholar
  39. 39.
    Li, M., Zhu, Y.: Image classification via LZ78 based string kernel: a comparative study. In: Proceedings of the PAKDD, volume 3918 of LNCS, pp. 704–712. Springer (2006)Google Scholar
  40. 40.
    Li, Z., Li, J., Huo, H.: Optimal in-place suffix sorting. ArXiv CoRR, arXiv:1610.08305 (2016)
  41. 41.
    Main, M.G.: Detecting leftmost maximal periodicities. Discrete Appl. Math. 25(1–2), 145–153 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Munro, J.I., Navarro, G., Nekrich, Y.: Space-efficient construction of compressed indexes in deterministic linear time. In: Proceedings of the SODA, pp. 408–424. SIAM (2017)Google Scholar
  44. 44.
    Nakashima, Y., Tomohiro, I., Inenaga, S., Bannai, H., Takeda, M.: Constructing LZ78 tries and position heaps in linear time for large alphabets. Inf. Process. Lett. 115(9), 655–659 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Navarro, G.: Indexing text using the Ziv–Lempel trie. J. Discrete Algorithms 2(1), 87–114 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  46. 46.
    Navarro, G.: Compact Data Structures: A practical approach. Cambridge University Press, Cambridge (2016)CrossRefGoogle Scholar
  47. 47.
    Navarro, G., Nekrich, Y.: Optimal dynamic sequence representations. SIAM J. Comput. 43(5), 1781–1806 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  48. 48.
    Navarro, G., Sadakane, K.: Fully functional static and dynamic succinct trees. ACM Trans. Algorithms 10(3), 16 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  49. 49.
    Nong, G.: Practical linear-time \(\cal{O}(1)\)-workspace suffix sorting for constant alphabets. ACM Trans. Inf. Syst. 31(3), 15 (2013)MathSciNetCrossRefGoogle Scholar
  50. 50.
    Ohlebusch, E., Fischer, J., Gog, S.: CST++. In: Proceedings of the SPIRE, volume 6393 of LNCS, pp. 322–333. Springer (2010)Google Scholar
  51. 51.
    Ouyang, J., Luo, H., Wang, Z., Tian, J., Liu, C., Sheng, K.: FPGA implementation of GZIP compression and decompression for IDC services. In: Proceedings of the FPT, pp. 265–268. IEEE Computer Society (2010)Google Scholar
  52. 52.
    Richard, G.G., Case, A.: In lieu of swap: analyzing compressed RAM in Mac OS X and Linux. Digit. Investig. 11, 3–12 (2014)CrossRefGoogle Scholar
  53. 53.
    Russo, L.M.S., Navarro, G., Oliveira, A.L.: Fully-compressed suffix trees. In: Proceedings of the LATIN, volume 4957 of LNCS, pp. 362–373. Springer (2008)Google Scholar
  54. 54.
    Sadakane, K.: Succinct representations of LCP information and improvements in the compressed suffix arrays. In: Proceedings of the SODA, pp. 225–237. ACM/SIAM (2002)Google Scholar
  55. 55.
    Sadakane, K.: Compressed suffix trees with full functionality. Theory Comput. Syst. 41(4), 589–607 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  56. 56.
    Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: Proceedings of the SODA, pp. 1230–1239. ACM/SIAM (2006)Google Scholar
  57. 57.
    Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29(4), 928–951 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  58. 58.
    Välimäki, N., Mäkinen, V., Gerlach, W., Dixit, K.: Engineering a compressed suffix tree implementation. ACM J. Exp. Algorithm. 14, 2 (2009)MathSciNetzbMATHGoogle Scholar
  59. 59.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  60. 60.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable length coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceTU DortmundDortmundGermany
  2. 2.Department of Artificial Intelligence, Kyushu Institute of TechnologyFukuokaJapan
  3. 3.Graduate School of Information Science and TechnologyUniversity of TokyoTokyoJapan

Personalised recommendations