Abstract
Relative Lempel-Ziv (RLZ) is a variant of LZ77 that can compress well collections of similar genomes while still allowing fast random access to them. In theory, at the cost of using sublinear extra space, accessing an arbitrary character takes constant time. We show that even in practice this works quite well: e.g., we can compress 36 S. cerevisiae genomes from a total of 464 MB to 11 MB and still support random access to them in under 50 nanoseconds per character, even when the accessed substrings are short. Our theoretical contribution is an optimized representation of RLZ’s pointers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Deorowicz, S., Danek, A., Grabowski, S.: Genome compression: A novel approach for large collections. Bioinformatics 29(20), 2572–2578 (2013)
Deorowicz, S., Grabowski, S.: Robust relative compression of genomes with random access. Bioinformatics 27(21), 2979–2986 (2011)
Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)
Gagie, T., Gawrychowski, P., Puglisi, S.J.: Faster approximate pattern matching in compressed repetitive texts. In: Asano, T., Nakano, S.-i., Okamoto, Y., Watanabe, O. (eds.) ISAAC 2011. LNCS, vol. 7074, pp. 653–662. Springer, Heidelberg (2011)
Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: Plug and play with succinct data structures. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 326–337. Springer, Heidelberg (2014)
Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lightweight Lempel-Ziv parsing. In: Bonifaci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933, pp. 139–150. Springer, Heidelberg (2013)
Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor. Comp. Sci. 483, 115–133 (2013)
Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 201–206. Springer, Heidelberg (2010)
Kuruppu, S., Puglisi, S.J., Zobel, J.: Optimized relative Lempel-Ziv compression of genomes. In: Proc. ACSC, pp. 91–98 (2011)
Maruyama, S., Tabei, Y., Sakamoto, H., Sadakane, K.: Fully-online grammar compression. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 218–229. Springer, Heidelberg (2013)
Pǎtraşcu, M.: Succincter. In: Proc. FOCS, pp. 305–313 (2008)
Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 247–258. Springer, Heidelberg (2013)
Wandelt, S., Leser, U.: FRESCO: Referential compression of highly-similar sequences. IEEE Trans. Comp. Bio. Bioinf. 10(5), 1275–1288 (2013)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ferrada, H., Gagie, T., Gog, S., Puglisi, S.J. (2014). Relative Lempel-Ziv with Constant-Time Random Access. In: Moura, E., Crochemore, M. (eds) String Processing and Information Retrieval. SPIRE 2014. Lecture Notes in Computer Science, vol 8799. Springer, Cham. https://doi.org/10.1007/978-3-319-11918-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-11918-2_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11917-5
Online ISBN: 978-3-319-11918-2
eBook Packages: Computer ScienceComputer Science (R0)