Skip to main content
Log in

Compressed Spaced Suffix Arrays

  • Published:
Mathematics in Computer Science Aims and scope Submit manuscript

Abstract

As a first step in designing relatively-compressed data structures—i.e., such that storing an instance for one dataset helps us store instances for similar datasets—we consider how to compress spaced suffix arrays relative to normal suffix arrays and still support fast access to them. This problem is of practical interest when performing similarity search with spaced seeds because using several seeds in parallel significantly improves their performance, but with existing approaches we keep a separate linear-space hash table or spaced suffix array for each seed. We first prove a theoretical upper bound on the space needed to store a spaced suffix array when we already have the suffix array. We then present experiments indicating that our approach works even better in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Barbay, J., Claude, F., Gagie, T., Navarro, G., Nekrich, Y.: Efficient fully-compressed sequence representations. Algorithmica 69, 232–268 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  2. Battaglia, G., Cangelosi, D., Grossi, R., Pisanti, N.: Masking patterns in sequences: a new class of motif discovery with don’t cares. Theor. Comput. Sci. 410, 4327–4340 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Belazzougui, D., Gagie, T., Gog, S., Manzini, G., Sirén, J.: Relative FM-indexes. In: Proceedings of the 21st Symposium on String Processing and Information Retrieval (SPIRE), pp. 52–64 (2014)

  4. Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. ACM Trans. Algorithms 11(4) (2015)

  5. Boucher, C., Bowe, A., Gagie, T., Manzini, G., Sirén, J.: Relative select. In: Proceedings of the 22nd Symposium on String Processing and Information Retrieval (SPIRE), pp. 149–155 (2015)

  6. Bowe, A., Onodera, T., Sadakane, K., Shibuya, T.: Succinct de Bruijn graphs. In: Proceedings of the 12th Workshop on Algorithms in Bioinformatics (WABI), pp. 225–235 (2012)

  7. Brown, D.G.: A survey of seeding for sequence alignment. In: Mǎndoiu, I., Zelikovsky, A. (eds.) Bioinformatics Algorithms: Techniques and Applications, pp. 126–152. Wiley-Interscience, Hoboken (2008)

    Google Scholar 

  8. Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. Fundamenta Informicae 56, 51–70 (2003)

    MathSciNet  MATH  Google Scholar 

  9. Burrows, M., Wheeler, D.J.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)

  10. Crochemore, M., Tischler, G.: The gapped suffix array: a new index structure for fast approximate matching. In: Proceedings of the 17th Symposium on String Processing and Information Retrieval (SPIRE), pp. 359–364 (2010)

  11. David, M., Dzamba, M., Lister, D., Ilie, L., Brudno, M.: SHRiMP2: sensitive yet practical short read mapping. Bioinformatics 27, 1011–1012 (2011)

    Article  Google Scholar 

  12. Egidi, L., Manzini, G.: Better spaced seeds using quadratic residues. J. Comput. Syst. Sci. 79, 1144–1155 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  13. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52, 552–581 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  14. Gagie, T., Manzini, G., Valenzuela, D.: Compressed spaced suffix arrays. In: Proceedings of the 2nd International Conference on Algorithms for Big Data (ICABD), pp. 37–45 (2014)

  15. Gagie, T., Navarro, G., Puglisi, S.J., Sirén, J.: Relative compressed suffix trees. Technical Report. arXiv:1508.02550 (2015)

  16. Homer, N., Merriman, B., Nelson, S.F.: BFAST: an alignment tool for large scale genome resequencing. PLOS One 4, e7767 (2009)

    Article  Google Scholar 

  17. Ilie, L., Ilie, S., Khoshraftar, S., Mansouri Bigvand, A.: Seeds for effective oligonucleotide design. BMC Genomics 12, 280 (2011)

    Article  Google Scholar 

  18. Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., McVean, G.: De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44, 226–232 (2012)

    Article  Google Scholar 

  19. Kiełbasa, S.M., Wan, R., Sato, K., Horton, P., Frith, M.C.: Adaptive seeds tame genomic sequence comparison. Genome Res. 21, 487–493 (2011)

    Article  Google Scholar 

  20. Langmeand, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)

    Article  Google Scholar 

  21. Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)

    Article  Google Scholar 

  22. Peterlongo, P., Pisanti, N., Boyer, F., Pereira do Lago, A., Sagot, M.: Lossless filter for multiple repetitions with Hamming distance. J. Discrete Algorithms 6(3), 497–509 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  23. Russo, L.M.S., Tischler, G.: Succinct gapped suffix arrays. In: Proceedings of the 17th Symposium on String Processing and Information Retrieval (SPIRE), pp. 290–294 (2011)

  24. Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. J. Comput. Biol. 12, 847–861 (2005)

    Article  Google Scholar 

  25. Supowit, K.J.: Decomposing a set of points into chains, with applications to permutation and circle graphs. Inform. Process. Lett. 21, 249–252 (1985)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Travis Gagie.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gagie, T., Manzini, G. & Valenzuela, D. Compressed Spaced Suffix Arrays. Math.Comput.Sci. 11, 151–157 (2017). https://doi.org/10.1007/s11786-016-0283-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11786-016-0283-z

Keywords

Mathematics Subject Classification

Navigation