Advertisement

Computing the Burrows-Wheeler Transform of a String and Its Reverse

  • Enno Ohlebusch
  • Timo Beller
  • Mohamed I. Abouelhoda
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7354)

Abstract

The contribution of this paper is twofold. First, we provide new theoretical insights into the relationship between a string and its reverse: If the Burrows-Wheeler transform (BWT) of a string has been computed by sorting its suffixes, then the BWT and the longest common prefix array of the reverse string can be derived from it without suffix sorting. Furthermore, we show that the longest common prefix arrays of a string and its reverse are permutations of each other. Second, we provide a parallel algorithm that, given the BWT of a string, computes the BWT of its reverse much faster than all known (parallel) suffix sorting algorithms. Some bioinformatics applications will benefit from this.

Keywords

Internal Node Lexicographic Order Recursive Call Sorting Algorithm Procedure Call 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Beller, T., Gog, S., Ohlebusch, E., Schnattinger, T.: Computing the Longest Common Prefix Array Based on the Burrows-Wheeler Transform. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 197–208. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  2. 2.
    Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Research Report 124, Digital Systems Research Center (1994)Google Scholar
  3. 3.
    Culpepper, J.S., Navarro, G., Puglisi, S.J., Turpin, A.: Top-k Ranked Document Search in General Text Databases. In: de Berg, M., Meyer, U. (eds.) ESA 2010. LNCS, vol. 6347, pp. 194–205. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. IEEE Symposium on Foundations of Computer Science, pp. 390–398 (2000)Google Scholar
  5. 5.
    Futamura, N., Aluru, S., Kurtz, S.: Parallel suffix sorting. In: Proc. 9th International Conference on Advanced Computing and Communications, pp. 76–81. IEEE (2001)Google Scholar
  6. 6.
    Gog, S., Ohlebusch, E.: Lightweight LCP-array construction in linear time (2011), http://arxiv.org/pdf/1012.4263
  7. 7.
    Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. 14th Annual Symposium on Discrete Algorithms, pp. 841–850 (2003)Google Scholar
  8. 8.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)zbMATHCrossRefGoogle Scholar
  9. 9.
    Homann, R., Fleer, D., Giegerich, R., Rehmsmeier, M.: mkESA: Enhanced suffix array construction tool. Bioinformatics 25(8), 1084–1085 (2009)CrossRefGoogle Scholar
  10. 10.
    Kärkkäinen, J., Manzini, G., Puglisi, S.J.: Permuted Longest-Common-Prefix Array. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp. 181–192. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    Kasai, T., Lee, G.H., Arimura, H., Arikawa, S., Park, K.: Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  12. 12.
    Lam, T.-W., Li, R., Tam, A., Wong, S., Wu, E., Yiu, S.-M.: High throughput short read alignment via bi-directional BWT. In: Proc. International Conference on Bioinformatics and Biomedicine, pp. 31–36. IEEE Computer Society (2009)Google Scholar
  13. 13.
    Manzini, G.: Two Space Saving Tricks for Linear Time LCP Array Computation. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 372–383. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Computing Surveys 39(2), 1–31 (2007)CrossRefGoogle Scholar
  15. 15.
    Schnattinger, T., Ohlebusch, E., Gog, S.: Bidirectional Search in a String with Wavelet Trees. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 40–50. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Simpson, J.T., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367–i373 (2010)CrossRefGoogle Scholar
  17. 17.
    Välimäki, N., Ladra, S., Mäkinen, V.: Approximate All-Pairs Suffix/Prefix Overlaps. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 76–87. Springer, Heidelberg (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Enno Ohlebusch
    • 1
  • Timo Beller
    • 1
  • Mohamed I. Abouelhoda
    • 2
    • 3
  1. 1.Institute of Theoretical Computer ScienceUniversity of UlmUlmGermany
  2. 2.Center for Informatics SciencesNile UniversityGizaEgypt
  3. 3.Faculty of EngineeringCairo UniversityGizaEgypt

Personalised recommendations