Alphabet-Independent Compressed Text Indexing

  • Djamal Belazzougui
  • Gonzalo Navarro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6942)

Abstract

Self-indexes can represent a text in asymptotically optimal space under the k-th order entropy model, give access to text substrings, and support indexed pattern searches. Their time complexities are not optimal, however: they always depend on the alphabet size. In this paper we achieve, for the first time, full alphabet-independence in the time complexities of self-indexes, while retaining space optimality. We obtain also some relevant byproducts on compressed suffix trees.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Apostolico, A.: The myriad virtues of subword trees. In: Combinatorial Algorithms on Words. NATO ISI Series, pp. 85–96. Springer, Heidelberg (1985)CrossRefGoogle Scholar
  2. 2.
    Barbay, J., Gagie, T., Navarro, G., Nekrich, Y.: Alphabet partitioning for compressed rank/Select and applications. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part II. LNCS, vol. 6507, pp. 315–326. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Barbay, J., He, M., Munro, J.I., Rao, S.S.: Succinct indexes for strings, binary relations and multi-labeled trees. In: SODA, pp. 680–689 (2007)Google Scholar
  4. 4.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: searching a sorted table with o(1) accesses. In: SODA, pp. 785–794 (2009)Google Scholar
  5. 5.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practise of monotone minimal perfect hashing. In: ALENEX (2009)Google Scholar
  6. 6.
    Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)Google Scholar
  7. 7.
    Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. J. ACM 52(4), 688–713 (2005)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Ferragina, P., Grossi, R.: The string b-tree: A new data structure for string search in external memory and its applications. J. ACM 46(2), 236–280 (1999)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS, pp. 390–398 (2000)Google Scholar
  10. 10.
    Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Alg. 3(2), article 20 (2007)Google Scholar
  12. 12.
    Gagie, T.: Large alphabets and incompressibility. Inf. Proc. Lett. 99(6), 246–251 (2006)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Golynski, A., Munro, J.I., Rao, S.S.: Rank/select operations on large alphabets: a tool for text indexing. In: SODA, pp. 368–373 (2006)Google Scholar
  14. 14.
    Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: SODA, pp. 841–850 (2003)Google Scholar
  15. 15.
    Grossi, R., Orlandi, A., Raman, R.: Optimal trade-offs for succinct string indexes. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds.) ICALP 2010. LNCS, vol. 6198, pp. 678–689. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Grossi, R., Vitter, J.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: STOC, pp. 397–406 (2000)Google Scholar
  17. 17.
    Lee, S., Park, K.: Dynamic rank-select structures with applications to run-length encoded texts. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 95–106. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  18. 18.
    Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comp. 22(5), 935–948 (1993)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Manzini, G.: An analysis of the Burrows-Wheeler transform. J. ACM 48(3), 407–430 (2001)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  21. 21.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1), article 2 (2007)Google Scholar
  22. 22.
    Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 233–242 (2002)Google Scholar
  23. 23.
    Sadakane, K.: Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Lee, D.T., Teng, S.-H. (eds.) ISAAC 2000. LNCS, vol. 1969, pp. 295–321. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  24. 24.
    Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Alg. 48(2), 294–313 (2003)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Sadakane, K.: Compressed suffix trees with full functionality. Theo. Comp. Sys. 41(4), 589–607 (2007)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Sadakane, K., Navarro, G.: Fully-functional succinct trees. In: SODA, pp. 134–149 (2010)Google Scholar
  27. 27.
    Weiner, P.: Linear pattern matching algorithm. In: Proc. Ann. IEEE Symp. on Switching and Automata Theory, pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Djamal Belazzougui
    • 1
  • Gonzalo Navarro
    • 2
  1. 1.LIAFAUniv. Paris Diderot - Paris 7France
  2. 2.Department of Computer ScienceUniversity of ChileChile

Personalised recommendations