Abstract
We design a new compressed suffix tree specifically tailored to highly repetitive text collections. This is particularly useful for sequence analysis on large collections of genomes of the close species. We build on an existing compressed suffix tree that applies statistical compression, and modify it so that it works on the grammar-compressed version of the longest common prefix array, whose differential version inherits much of the repetitiveness of the text.
Partially funded by Fondecyt grant 1-080019 and Millennium Institute for Cell Dynamics and Biotechnology (ICDB), Grant ICM P05-001-F, Mideplan, Chile.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apostolico, A.: The myriad virtues of subword trees, pp. 85–96. Combinatorial Algorithms on Words. NATO ISI Series. Springer (1985)
Brisaboa, N.R., Ladra, S., Navarro, G.: Directly Addressable Variable-Length Codes. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 122–130. Springer, Heidelberg (2009)
Cánovas, R., Navarro, G.: Practical Compressed Suffix Trees. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 94–105. Springer, Heidelberg (2010)
Claude, F., Fariña, A., Martínez-Prieto, M., Navarro, G.: Compressed q-gram indexing for highly repetitive biological sequences. In: Proc. 10th BIBE, pp. 86–91 (2010)
Claude, F., Navarro, G.: Self-indexed Text Compression Using Straight-Line Programs. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 235–246. Springer, Heidelberg (2009)
Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: From theory to practice. ACM J. Exp. Algor. 13, article 12 (2009)
Fischer, J.: Wee LCP. Inf. Proc. Lett. 110, 317–320 (2010)
Fischer, J., Mäkinen, V., Navarro, G.: Faster entropy-bounded compressed suffix trees. Theor. Comp. Sci. 410(51), 5354–5364 (2009)
Gog, S.: Compressed Suffix Trees: Design, Construction, and Applications. Ph.D. thesis, Univ. of Ulm, Germany (2011)
González, R., Navarro, G.: Compressed Text Indexes with Fast Locate. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 216–227. Springer, Heidelberg (2007)
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press (1997)
Kreft, S.: Self-index based on LZ77. Master’s thesis, Univ. of Chile (2010), arXiv:1112.4578v1
Kreft, S., Navarro, G.: Self-indexing Based on LZ77. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 41–54. Springer, Heidelberg (2011)
Kurtz, S.: Reducing the space requirements of suffix trees. Report 98-03, Technische Fakultät, Univ. Bielefeld, Germany (1998)
Larsson, J., Moffat, A.: Off-line dictionary-based compression. Proc. of the IEEE 88(11), 1722–1732 (2000)
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nordic J. Comp. 12(1), 40–66 (2005)
Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Comp. Biol. 17(3), 281–308 (2010)
Manber, U., Myers, E.: Suffix arrays: a new method for on-line string searches. SIAM J. Comp., 935–948 (1993)
Manzini, G.: An analysis of the Burrows-Wheeler transform. J. ACM 48(3), 407–430 (2001)
McCreight, E.: A space-economical suffix tree construction algorithm. J. ACM 32(2), 262–272 (1976)
Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1), article 2 (2007)
Ohlebusch, E., Fischer, J., Gog, S.: CST++. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 322–333. Springer, Heidelberg (2010)
Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th SODA, pp. 233–242 (2002)
Russo, L., Navarro, G., Oliveira, A.: Fully-compressed suffix trees. ACM Trans. Alg. 7(4), article 53 (2011)
Sadakane, K.: Compressed suffix trees with full functionality. Theor. Comp. Sys. 41(4), 589–607 (2007)
Weiner, P.: Linear pattern matching algorithms. In: IEEE Symp. Swit. and Aut. Theo., pp. 1–11 (1973)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abeliuk, A., Navarro, G. (2012). Compressed Suffix Trees for Repetitive Texts. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds) String Processing and Information Retrieval. SPIRE 2012. Lecture Notes in Computer Science, vol 7608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34109-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-34109-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34108-3
Online ISBN: 978-3-642-34109-0
eBook Packages: Computer ScienceComputer Science (R0)