Memory-Aware BWT by Segmenting Sequences to Support Subsequence Search
- 1.7k Downloads
Nowadays, Burrows-Wheeler transform (BWT) has been receiving significant attentions in academia for addressing subsequence matching problems. Although BWT is a typical technique to transform a sequence into a new sequence that is “easy to compress”, it can also be extended as a kind of full text index techniques. Traditional BWT requires nlogn + nlogσ bits to build index for a sequence with n characters, where σ is size of the alphabet. Building BWT index for a long sequence on PCs with limited memory is a great challenge. In order to solve the problem, we propose a novel variation of BWT index named S-BWT, which separates the source sequence into segments. It can reduce the memory cost to n(logσ + logn − logk )/k bits, where k is the number of segments. However, querying on each segment separately using the existing approaches has to undertake the risk of losing some significant results. In this paper, we propose two query methods based on S-BWT and guarantee to find all subsequence occurrences. Our methods can not only require small memory space, but also are faster than the state-of-art BWT backward search method for long sequence.
KeywordsBWT subsequence matching full text index
Unable to display preview. Download preview PDF.
- 1.Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical report, SRC Research Report 124 (1994)Google Scholar
- 2.Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2) (2007)Google Scholar
- 9.Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. ACM Journal of Experimental Algorithmics, 12 (2008)Google Scholar
- 13.Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS, pp. 390–398 (2000)Google Scholar
- 15.Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms 3(2) (2007)Google Scholar
- 16.Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1) (2007)Google Scholar