Computing the Longest Common Prefix Array Based on the Burrows-Wheeler Transform

Beller, Timo; Gog, Simon; Ohlebusch, Enno; Schnattinger, Thomas

doi:10.1007/978-3-642-24583-1_20

Timo Beller¹⁸,
Simon Gog¹⁸,
Enno Ohlebusch¹⁸ &
…
Thomas Schnattinger¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7024))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

792 Accesses
6 Citations

Abstract

Many sequence analysis tasks can be accomplished with a suffix array, and several of them additionally need the longest common prefix array. In large scale applications, suffix arrays are being replaced with full-text indexes that are based on the Burrows-Wheeler transform. In this paper, we present the first algorithm that computes the longest common prefix array directly on the wavelet tree of the Burrows-Wheeler transformed string. It runs in linear time and a practical implementation requires approximately 2.2 bytes per character.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brisaboa, N.R., Ladra, S., Navarro, G.: Directly addressable variable-length codes. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 122–130. Springer, Heidelberg (2009)
Chapter Google Scholar
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Research Report 124, Digital Systems Research Center (1994)
Google Scholar
Culpepper, J.S., Navarro, G., Puglisi, S.J., Turpin, A.: Top-k ranked document search in general text databases. In: de Berg, M., Meyer, U. (eds.) ESA 2010. LNCS, vol. 6347, pp. 194–205. Springer, Heidelberg (2010)
Chapter Google Scholar
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: IEEE Symposium on Foundations of Computer Science, pp. 390–398 (2000)
Google Scholar
Flick, P., Birney, E.: Sense from sequence reads: Methods for alignment and assembly. Nature Methods 6(11 suppl.), S6–S12 (2009)
Article Google Scholar
Gog, S., Ohlebusch, E.: Lightweight LCP-array construction in linear time (2011), arxiv.org/pdf/1012.4263
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc.14th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 841–850 (2003)
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)
Book MATH Google Scholar
Herold, J., Kurtz, S., Giegerich, R.: Efficient computation of absent words in genomic sequences. BMC Bioinformatics 9, 167 (2008)
Article Google Scholar
Jacobson, G.: Space-efficient static trees and graphs. In: Proc. 30th Annual Symposium on Foundations of Computer Science, pp. 549–554. IEEE, Los Alamitos (1989)
Chapter Google Scholar
Kärkkäinen, J., Manzini, G., Puglisi, S.J.: Permuted longest-common-prefix array. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 181–192. Springer, Heidelberg (2009)
Chapter Google Scholar
Kasai, T., Lee, G.H., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)
Chapter Google Scholar
Lippert, R.A.: Space-efficient whole genome comparisons with Burrows-Wheeler transforms. Journal of Computational Biology 12(4), 407–415 (2005)
Article Google Scholar
Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)
Article MathSciNet MATH Google Scholar
Manzini, G.: Two space saving tricks for linear time LCP array computation. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 372–383. Springer, Heidelberg (2004)
Chapter Google Scholar
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), Article 2 (2007)
Article MATH Google Scholar
Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Proc. Data Compression Conference, pp. 193–202. IEEE Computer Society, Los Alamitos (2009)
Google Scholar
Ohlebusch, E., Gog, S., Kügel, A.: Computing matching statistics and maximal exact matches on compressed full-text indexes. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 347–358. Springer, Heidelberg (2010)
Chapter Google Scholar
Okanohara, D., Sadakane, K.: A linear-time burrows-wheeler transform using induced sorting. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 90–101. Springer, Heidelberg (2009)
Chapter Google Scholar
Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Computing Surveys 39(2), 1–31 (2007)
Article Google Scholar
Puglisi, S.J., Turpin, A.: Space-time tradeoffs for longest-common-prefix array computation. In: Hong, S.-H., Nagamochi, H., Fukunaga, T. (eds.) ISAAC 2008. LNCS, vol. 5369, pp. 124–135. Springer, Heidelberg (2008)
Chapter Google Scholar
Schnattinger, T.: Bidirektionale indexbasierte Suche in Texten. Diploma thesis, University of Ulm, Germany (2010)
Google Scholar
Simpson, J.T., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367–i373 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Theoretical Computer Science, University of Ulm, D-89069, Ulm, Germany
Timo Beller, Simon Gog, Enno Ohlebusch & Thomas Schnattinger

Authors

Timo Beller
View author publications
You can also search for this author in PubMed Google Scholar
Simon Gog
View author publications
You can also search for this author in PubMed Google Scholar
Enno Ohlebusch
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Schnattinger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Università di Pisa, Italy
Roberto Grossi
Consiglio Nazionale delle Ricerche, Area della Ricerca di Pisa, Istituto di Scienza e Tecnologia dell’Informazione “Alessandro Faedo”, Via Giuseppe Moruzzi 1, 56124, Pisa, Italy
Fabrizio Sebastiani & Fabrizio Silvestri &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beller, T., Gog, S., Ohlebusch, E., Schnattinger, T. (2011). Computing the Longest Common Prefix Array Based on the Burrows-Wheeler Transform. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds) String Processing and Information Retrieval. SPIRE 2011. Lecture Notes in Computer Science, vol 7024. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24583-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-24583-1_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24582-4
Online ISBN: 978-3-642-24583-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics