Versatile Succinct Representations of the Bidirectional Burrows-Wheeler Transform

  • Djamal Belazzougui
  • Fabio Cunial
  • Juha Kärkkäinen
  • Veli Mäkinen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8125)

Abstract

We describe succinct and compact representations of the bidirectional bwt of a string s ∈ Σ* which provide increasing navigation power and a number of space-time tradeoffs. One such representation allows to extend a substring of s by one character from the left and from the right in constant time, taking O(|s| log |Σ|) bits of space. We then match the functions supported by each representation to a number of algorithms that traverse the nodes of the suffix tree of s, exploiting connections between the bwt and the suffix-link tree. This results in near-linear time algorithms for many sequence analysis problems (e.g. maximal unique matches), for the first time in succinct space.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Apostolico, A.: The myriad virtues of subword trees. Technical Report 85–540, Department of Computer Science, Purdue University (1985)Google Scholar
  2. 2.
    Apostolico, A., Bock, M.E., Lonardi, S.: Monotony of surprise and large-scale quest for unusual words. In: RECOMB 2002, pp. 22–31 (2002)Google Scholar
  3. 3.
    Apostolico, A., Bock, M.E., Lonardi, S., Xu, X.: Efficient detection of unusual words. J. Comput. Biol. 7(1-2), 71–94 (2000)CrossRefGoogle Scholar
  4. 4.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: searching a sorted table with o(1) accesses. In: SODA 2009, pp. 785–794 (2009)Google Scholar
  5. 5.
    Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. ACM Trans. Alg. (to appear, 2013)Google Scholar
  6. 6.
    Beller, T., Berger, K., Ohlebusch, E.: Space-efficient computation of maximal and supermaximal repeats in genome sequences. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 99–110. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Breslauer, D.: An on-line string superprimitivity test. Inform. Process. Lett. 44(6), 345–347 (1992)MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM T. Alg. 3(2), 20 (2007)Google Scholar
  9. 9.
    Fischer, J.: Optimal succinctness for range minimum queries. In: López-Ortiz, A. (ed.) LATIN 2010. LNCS, vol. 6034, pp. 158–169. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Fischer, J., Mäkinen, V., Navarro, G.: Faster entropy-bounded compressed suffix trees. Theor. Comput. Sci. 410(51), 5354–5364 (2009)MATHCrossRefGoogle Scholar
  11. 11.
    Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press (1997)Google Scholar
  12. 12.
    Hoare, C.A.R.: Quicksort. The Computer Journal 5(1), 10–16 (1962)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Hon, W.-K., Sadakane, K.: Space-economical algorithms for finding maximal unique matches. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 144–152. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Hon, W.-K., Sadakane, K., Sung, W.-K.: Breaking a time-and-space barrier in constructing full-text indices. SIAM J. Comput. 38(6), 2162–2178 (2009)MathSciNetMATHCrossRefGoogle Scholar
  15. 15.
    Kulekci, O., Vitter, J.S., Xu, B.: Efficient maximal repeat finding using the Burrows-Wheeler transform and wavelet tree. TCBB 9(2), 421–429 (2012)Google Scholar
  16. 16.
    Lam, T.W., Li, R., Tam, A., Wong, S., Wu, E., Yiu, S.: High throughput short read alignment via bi-directional BWT. In: BIBM 2009, pp. 31–36 (2009)Google Scholar
  17. 17.
    Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinform. 11(5), 473–483 (2010)MATHCrossRefGoogle Scholar
  18. 18.
    Li, R., Yu, C., Li, Y., Lam, T.W., Yiu, S.-M., Kristiansen, K., Wang, J.: Soap2: An improved ultrafast tool for short read alignment. Bioinformatics 25(15), 1966–1967 (2009)CrossRefGoogle Scholar
  19. 19.
    Ohlebusch, E., Gog, S., Kügel, A.: Computing matching statistics and maximal exact matches on compressed full-text indexes. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 347–358. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  20. 20.
    Pinho, A.J., Ferreira, P.J.S.G., Garcia, S.P., Rodrigues, J.M.O.S.: On finding minimal absent words. BMC Bioinformatics 10(1), 137 (2009)CrossRefGoogle Scholar
  21. 21.
    Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM T. Alg. 3(4) (2007)Google Scholar
  22. 22.
    Russo, L.M.S., Navarro, G., Oliveira, A.L.: Dynamic fully-compressed suffix trees. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 191–203. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  23. 23.
    Russo, L.M.S., Navarro, G., Oliveira, A.L.: Fully compressed suffix trees. ACM Trans. Alg. 7(4), 53 (2011)MathSciNetGoogle Scholar
  24. 24.
    Sadakane, K.: Compressed suffix trees with full functionality. Theor. Comput. Syst. 41(4), 589–607 (2007)MathSciNetMATHCrossRefGoogle Scholar
  25. 25.
    Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discrete Alg. 5(1), 12–22 (2007)MathSciNetMATHCrossRefGoogle Scholar
  26. 26.
    Sadakane, K., Navarro, G.: Fully-functional succinct trees. In: SODA 2010, pp. 134–149 (2010)Google Scholar
  27. 27.
    Schnattinger, T., Ohlebusch, E., Gog, S.: Bidirectional search in a string with wavelet trees. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 40–50. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  28. 28.
    Schnattinger, T., Ohlebusch, E., Gog, S.: Bidirectional search in a string with wavelet trees and bidirectional matching statistics. Inform. Comput. 213, 13–22 (2012)MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Djamal Belazzougui
    • 1
  • Fabio Cunial
    • 1
  • Juha Kärkkäinen
    • 1
  • Veli Mäkinen
    • 1
  1. 1.Helsinki Institute for Information Technology (HIIT), Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations