Abstract
The Burrows–Wheeler transform has applications in data compression as well as full text indexing. Despite its important applications and various existing algorithmic approaches the construction of the transform for large data sets is still challenging. In this paper we present a new semi external memory algorithm for constructing the Burrows–Wheeler transform. It is capable of constructing the transform for an input text of length n over a finite alphabet in time \(O(n\log ^2\log n)\) on average, if sufficient internal memory is available to hold a fixed fraction of the input text. In the worst case the run-time is \(O(n\log n \log \log n)\). The amount of space used by the algorithm in external memory is O(n) bits. Based on the serial version we also present a shared memory parallel algorithm running in time \(O(\frac{n}{p}\max \{\log ^2\log n+\log p\})\) on average when p processors are available.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Beller, T., Zwerger, M., Gog, S., Ohlebusch, E.: Space-efficient construction of the Burrows–Wheeler transform. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE. Lecture Notes in Computer Science, vol. 8214, pp. 5–16. Springer, Berlin (2013)
Bingmann, T., Fischer, J., Osipov, V.: Inducing suffix and LCP arrays in external memory. In: Sanders, P., Zeh, N. (eds.) ALENEX, pp. 88–102. SIAM, Philadelphia (2013)
Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Digital Systems Research Center, RR-124 (1994)
Crochemore, M., Grossi, R., Kärkkäinen, J., Landau, G.M.: A constant-space comparison-based algorithm for computing the Burrows–Wheeler transform. In: Fischer, J., Sanders, P. (eds.) CPM. Lecture Notes in Computer Science, vol. 7922, pp. 74–82. Springer, Berlin (2013)
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007)
Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. ACM J. Exp. Algorithmics 12, 1–24 (2008)
Elias, P.: Universal codeword sets and representations of the integers. IEEE Trans. Inf. Theory 21(2), 194–203 (1975)
Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. Algorithmica 63(3), 707–730 (2012)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: 41st Annual Symposium on Foundations of Computer Science, 2000. Proceedings, pp. 390–398. IEEE, New York (2000)
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: SODA, pp. 841–850 (2003)
Hon, W.-K., Sadakane, K., Sung, W.-K.: Breaking a time-and-space barrier in constructing full-text indices. In: FOCS, pp. 251–260. IEEE Computer Society, New York (2003)
Kärkkäinen, J., Kempa, D.: Engineering a lightweight external memory suffix array construction algorithm. In: Iliopoulos, C.S., Langiu, A. (eds.) 2nd International Conference on Algorithms for Big Data (ICABD2014), number 1146 in CEUR-WS Proceedings, pp. 53–60, Aachen (2014)
Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Automata, Languages and Programming, pp. 943–955. Springer, Berlin (2003)
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53(6), 918–936 (2006)
Kärkkäinen, J., Tischler, G.: Near in place linear time minimum redundancy coding. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (eds.) DCC, pp. 411–420. IEEE, New York (2013)
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM. Lecture Notes in Computer Science, vol. 2089, pp. 181–192. Springer, Berlin (2001)
Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Kolpakov, R., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: Proceedings of the 1999 Symposium on Foundations of Computer Science (FOCS’99), New York, USA, pp. 596–604, 17–19 October 1999. IEEE Computer Society, New York (1999)
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Moffat, A., Turpin, A.: On the implementation of minimum redundancy prefix codes. IEEE Trans. Commun. 45(10), 1200–1207 (1997)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. (CSUR) 39(1), 2 (2007)
Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput. 60(10), 1471–1484 (2011)
Okanohara, D., Sadakane, K.: A linear-time Burrows–Wheeler transform using induced sorting. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE. Lecture Notes in Computer Science, vol. 5721, pp. 90–101. Springer, Berlin (2009)
Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 1–31 (2007)
Szpankowski, W.: On the height of digital trees and related problems. Algorithmica 6(1–6), 256–277 (1991)
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the Wellcome Trust.
Full version of an extended abstract which appeared in the Proceedings of the 2nd International Conference on Algorithms for Big Data.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Tischler, G. Faster Average Case Low Memory Semi-external Construction of the Burrows–Wheeler Transform. Math.Comput.Sci. 11, 159–176 (2017). https://doi.org/10.1007/s11786-017-0296-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11786-017-0296-2