Skip to main content

Relative Lempel-Ziv with Constant-Time Random Access

  • Conference paper
Book cover String Processing and Information Retrieval (SPIRE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8799))

Included in the following conference series:

Abstract

Relative Lempel-Ziv (RLZ) is a variant of LZ77 that can compress well collections of similar genomes while still allowing fast random access to them. In theory, at the cost of using sublinear extra space, accessing an arbitrary character takes constant time. We show that even in practice this works quite well: e.g., we can compress 36 S. cerevisiae genomes from a total of 464 MB to 11 MB and still support random access to them in under 50 nanoseconds per character, even when the accessed substrings are short. Our theoretical contribution is an optimized representation of RLZ’s pointers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Deorowicz, S., Danek, A., Grabowski, S.: Genome compression: A novel approach for large collections. Bioinformatics 29(20), 2572–2578 (2013)

    Article  Google Scholar 

  2. Deorowicz, S., Grabowski, S.: Robust relative compression of genomes with random access. Bioinformatics 27(21), 2979–2986 (2011)

    Article  Google Scholar 

  3. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)

    Article  MathSciNet  Google Scholar 

  4. Gagie, T., Gawrychowski, P., Puglisi, S.J.: Faster approximate pattern matching in compressed repetitive texts. In: Asano, T., Nakano, S.-i., Okamoto, Y., Watanabe, O. (eds.) ISAAC 2011. LNCS, vol. 7074, pp. 653–662. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: Plug and play with succinct data structures. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 326–337. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  6. Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lightweight Lempel-Ziv parsing. In: Bonifaci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933, pp. 139–150. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  7. Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor. Comp. Sci. 483, 115–133 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  8. Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 201–206. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Kuruppu, S., Puglisi, S.J., Zobel, J.: Optimized relative Lempel-Ziv compression of genomes. In: Proc. ACSC, pp. 91–98 (2011)

    Google Scholar 

  10. Maruyama, S., Tabei, Y., Sakamoto, H., Sadakane, K.: Fully-online grammar compression. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 218–229. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  11. Pǎtraşcu, M.: Succincter. In: Proc. FOCS, pp. 305–313 (2008)

    Google Scholar 

  12. Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 247–258. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  13. Wandelt, S., Leser, U.: FRESCO: Referential compression of highly-similar sequences. IEEE Trans. Comp. Bio. Bioinf. 10(5), 1275–1288 (2013)

    Article  Google Scholar 

  14. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ferrada, H., Gagie, T., Gog, S., Puglisi, S.J. (2014). Relative Lempel-Ziv with Constant-Time Random Access. In: Moura, E., Crochemore, M. (eds) String Processing and Information Retrieval. SPIRE 2014. Lecture Notes in Computer Science, vol 8799. Springer, Cham. https://doi.org/10.1007/978-3-319-11918-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11918-2_2

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11917-5

  • Online ISBN: 978-3-319-11918-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics