RLZAP: Relative Lempel-Ziv with Adaptive Pointers

  • Anthony J. Cox
  • Andrea Farruggia
  • Travis Gagie
  • Simon J. Puglisi
  • Jouni Sirén
Conference paper

DOI: 10.1007/978-3-319-46049-9_1

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9954)
Cite this paper as:
Cox A.J., Farruggia A., Gagie T., Puglisi S.J., Sirén J. (2016) RLZAP: Relative Lempel-Ziv with Adaptive Pointers. In: Inenaga S., Sadakane K., Sakai T. (eds) String Processing and Information Retrieval. SPIRE 2016. Lecture Notes in Computer Science, vol 9954. Springer, Cham

Abstract

Relative Lempel-Ziv (RLZ) is a popular algorithm for compressing databases of genomes from individuals of the same species when fast random access is desired. With Kuruppu et al.’s (SPIRE 2010) original implementation, a reference genome is selected and then the other genomes are greedily parsed into phrases exactly matching substrings of the reference. Deorowicz and Grabowski (Bioinformatics, 2011) pointed out that letting each phrase end with a mismatch character usually gives better compression because many of the differences between individuals’ genomes are single-nucleotide substitutions. Ferrada et al. (SPIRE 2014) then pointed out that also using relative pointers and run-length compressing them usually gives even better compression. In this paper we generalize Ferrada et al.’s idea to handle well also short insertions, deletions and multi-character substitutions. We show experimentally that our generalization achieves better compression than Ferrada et al.’s implementation with comparable random-access times.

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Anthony J. Cox
    • 1
  • Andrea Farruggia
    • 2
  • Travis Gagie
    • 3
    • 4
  • Simon J. Puglisi
    • 3
    • 4
  • Jouni Sirén
    • 5
  1. 1.Illumina Cambridge Ltd.CambridgeUK
  2. 2.University of PisaPisaItaly
  3. 3.Helsinki Institute for Information TechnologyEspooFinland
  4. 4.University of HelsinkiHelsinkiFinland
  5. 5.Wellcome Trust Sanger InstituteHinxtonUK

Personalised recommendations