Faster Compressed Suffix Trees for Repetitive Text Collections

  • Gonzalo Navarro
  • Alberto Ordóñez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8504)

Abstract

Recent compressed suffix trees targeted to highly repetitive text collections reach excellent compression performance, but operation times in the order of milliseconds. We design a new suffix tree representation for this scenario that still achieves very low space usage, only slightly larger than the best previous one, but supports the operations within microseconds. This puts the data structure in the same performance level of compressed suffix trees designed for standard text collections, which on repetitive collections use many times more space than our new structure.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abeliuk, A., Cánovas, R., Navarro, G.: Practical compressed suffix trees. Algorithms 6(2), 319–351 (2013)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Abeliuk, A., Navarro, G.: Compressed suffix trees for repetitive texts. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 30–41. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Apostolico, A.: The myriad virtues of subword trees. Combinatorial Algorithms on Words. NATO ISI Series, pp. 85–96. Springer (1985)Google Scholar
  4. 4.
    Arroyuelo, D., Cánovas, R., Navarro, G., Sadakane, K.: Succinct trees in practice. In: Proc. ALENEX, pp. 84–97 (2010)Google Scholar
  5. 5.
    Bille, P., Landau, G., Raman, R., Sadakane, K., Rao, S.S., Weimann, O.: Random access to grammar-compressed strings. In: Proc. SODA, pp. 373–389 (2011)Google Scholar
  6. 6.
    Brisaboa, N., Ladra, S., Navarro, G.: DACs: Bringing direct access to variable-length codes. Inf. Proc. Manag. 49(1), 392–404 (2013)CrossRefGoogle Scholar
  7. 7.
    Cánovas, R., Navarro, G.: Practical compressed suffix trees. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 94–105. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Claude, F., Navarro, G.: Improved grammar-based compressed indexes. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 180–192. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    Comon, H., Dauchet, M., Gilleron, R., Löding, C., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree Automata Techniques and Applications. INRIA (2007)Google Scholar
  10. 10.
    Do, H.-H., Jansson, J., Sadakane, K., Sung, W.-K.: Fast relative Lempel-Ziv self-index for similar sequences. In: Snoeyink, J., Lu, P., Su, K., Wang, L. (eds.) FAW-AAIM 2012. LNCS, vol. 7285, pp. 291–302. Springer, Heidelberg (2012)Google Scholar
  11. 11.
    Fischer, J.: Wee LCP. Inf. Proc. Lett. 110, 317–320 (2010)CrossRefMATHGoogle Scholar
  12. 12.
    Fischer, J., Mäkinen, V., Navarro, G.: Faster entropy-bounded compressed suffix trees. Theor. Comp. Sci. 410(51), 5354–5364 (2009)CrossRefMATHGoogle Scholar
  13. 13.
    Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  14. 14.
    Gog, S.: Compressed Suffix Trees: Design, Construction, and Applications. PhD thesis, Univ. of Ulm, Germany (2011)Google Scholar
  15. 15.
    Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press (1997)Google Scholar
  16. 16.
    Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor. Comp. Sci. 483, 115–133 (2013)CrossRefMATHMathSciNetGoogle Scholar
  17. 17.
    Kuruppu, S., Puglisi, S.J., Zobel, J.: Optimized relative Lempel-Ziv compression of genomes. In: Proc. ACSC, CRPIT, vol. 113, pp. 91–98 (2011)Google Scholar
  18. 18.
    Larsson, J., Moffat, A.: Off-line dictionary-based compression. Proc. of the IEEE 88(11), 1722–1732 (2000)CrossRefGoogle Scholar
  19. 19.
    Lohrey, M., Maneth, S., Mennicke, R.: Tree structure compression with repair. In: Proc. DCC, pp. 353–362 (2011)Google Scholar
  20. 20.
    Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Comp. Biol. 17(3), 281–308 (2010)CrossRefGoogle Scholar
  21. 21.
    Manber, U., Myers, E.: Suffix arrays: a new method for on-line string searches. In: SIAM J. Comp., pp. 935–948 (1993)Google Scholar
  22. 22.
    Maneth, S., Busatto, G.: Tree transducers and tree compressions. In: Walukiewicz, I. (ed.) FOSSACS 2004. LNCS, vol. 2987, pp. 363–377. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  23. 23.
    Manzini, G.: An analysis of the Burrows-Wheeler transform. J. ACM 48(3), 407–430 (2001)CrossRefMathSciNetGoogle Scholar
  24. 24.
    Munro, J., Raman, R., Raman, V., Srinivasa Rao, S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  25. 25.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1), article 2 (2007)Google Scholar
  26. 26.
    Navarro, G., Puglisi, S., Valenzuela, D.: Practical compressed document retrieval. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 193–205. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  27. 27.
    Ohlebusch, E.: Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction. Oldenbusch Verlag (2013)Google Scholar
  28. 28.
    Ohlebusch, E., Fischer, J., Gog, S.: CST++. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 322–333. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  29. 29.
    Russo, L., Navarro, G., Oliveira, A.: Fully-compressed suffix trees. ACM Trans. Alg. 7(4), article 53 (2011)Google Scholar
  30. 30.
    Sadakane, K.: Compressed suffix trees with full functionality. Theor. Comp. Sys. 41(4), 589–607 (2007)CrossRefMATHMathSciNetGoogle Scholar
  31. 31.
    Sadakane, K., Navarro, G.: Fully-functional succinct trees. In: Proc. SODA, pp. 134–149 (2010)Google Scholar
  32. 32.
    Tabei, Y., Takabatake, Y., Sakamoto, H.: A succinct grammar compression. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 235–246. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  33. 33.
    Weiner, P.: Linear pattern matching algorithms. In: IEEE Symp. Swit. and Aut. Theo., pp. 1–11 (1973)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Gonzalo Navarro
    • 1
  • Alberto Ordóñez
    • 2
  1. 1.Dept. of Computer ScienceUniv. of ChileChile
  2. 2.Lab. de Bases de DatosUniv. da CoruñaSpain

Personalised recommendations